Today, we release a new version of our malware clustering solution: zynamics VxClass 1.5[1].
Compared to the previous release (VxClass 1.3) and aside from fixing tons of bugs, we improved version 1.5 with new and upgraded system software, updated the BinDiff-based differ component and finally switched to IDA Pro 6.0.
Here are some more things we changed:
- Updated the base system to the new Debian 6.0 (“Squeeze”)
- Upgraded to Python 2.7.1 for VxClass base code
- Upgraded to IDA Pro 6.0 and IDAPython 1.4.3, pefile 1.2.10-96
- Updated the diffing engine to the newest BinDiff version
- Performance improvements when using VxClass in a cluster
- Huge performance improvements: A single machine can now easily process 8000 samples a day.
- New top-10 most similar visualization (more below)
And of course this release also includes the signature generation component Thomas blogged about here, here and here.
With improved performance, mining the data of a VxClass system that contains tens of thousands of malware samples provides more and more insight. Unfortunately, up until now, this has been more difficult than necessary. For example, answering questions like “What are the 10 most-similar malware samples for this particular sample?” usually meant writing custom Python scripts that access the built-in XMLRPC interface.
Starting with this release, we will continue to add more visualization options for the data in the system. We start off simple with the top 10 most-similar and Venn diagram visualizations.
When using the “family tree” applet that has been present in VxClass since the first release, another “caveat” of being able to process tens of thousands of samples becomes apparent: humans cannot visually process several thousand clusters organized in a graph. Thus, the next release will contain an alternate visualization for the “family tree”. I’ll leave the details to another blog post.
All existing customers within their support periods will receive an e-mail regarding the upgrade in the next few days.
[1] Following the naming tradition of Ubuntu, VxClass releases are code-named with an alliteration. This release is code-named “exploited emu” which follows after version 1.3’s “malicious monkey”.