Author Archive

BinDiff 4.0 available today :-)


After several months of silence due to our team moving, finding a new home, and generally working really hard, we are happy to announce today that a new version of BinDiff is available! While the underlying comparison engine has only changed slightly, we have some significant improvements on the UI, and some improvements that are particularly useful for porting symbolic information from FOSS libraries into your disassemblies. In the following, I will highlight my favourite new features:

Call graph difference visualisation

With more complex differences between two executables, it is sometimes easy to miss the big picture by drilling down too much on changes to individual functions. With BinDiff 4.0, I now have the ability to not only examine changes on the level of the individual function, but also on the call graph. As with most UI improvements, an image is much more useful than a long diatribe; I will let the following screenshot speak for itself:

Examining changes on the callgraph level

Combined visualization of two flowgraphs

Ever since the very first version of BinDiff, the only way to examine a change in a flowgraph was by using our split-screen approach: One function on each side, laid out in a similar manner, with colors indicating changes. While this works pretty well (and is still my favorite way of looking at changes), it is sometimes a bit cumbersome. In the new UI, we added an additional way of examining changes: We merge the two graphs into one, and have a vertical split on the basic block / node level. This allows full-screen examination of changes without the need for splitting the screen.

The combined visualisation of changes

Iterative diffing

Over the last years, symbol porting has eclipsed patch analysis as my primary use for BinDiff. In many situations, I need to pull information from a FOSS project into an existing disassembly. I usually compile the FOSS project with symbols, attempting to approximate the build settings of the executable I am analyzing. I then BinDiff the disassembly against the compiled FOSS library and selectively import symbols and names for the functions that were recognized properly. While BinDiff often produces pretty good results, only a fraction of the functions will be recognized properly. In such situations, I often wished I could assist BinDiff infer further matches. With BinDiff 4.0, I can do just that: I can confirm that a pair of functions are matched correctly, and then tell BinDiff to re-run with the confirmed functions as starting points for further inference. This iterative approach allows me to match more and more functions while porting my symbols, yielding a much larger percentage of symbols in my disassembly than what would have been achieved in a single round of comparison.

Confirming a few matches

After confirming, click in the "Diff Database Incrementally" button

More Pie Charts

When comparing two pieces of related code, it is often useful to obtain a quick overview of the degree of code overlap between two files. What fraction of the functions in an executable could be mapped to the other executable? How similar were these functions? While all this information is available to BinDiff, up until the new version we never visualized this information in a central location. This has changed with the new UI – we now generate pretty pie charts, almost instantly usable in your favorite presentation software.

Pretty pies !

There are other new features in the UI – just give it a spin. After all, BinDiff is now directly available from our website and the price has been lowered to just 200 USD!

zynamics acquired by Google !


We’re pleased to announce that zynamics has been acquired by Google! If you’re an existing customer and do not receive our email announcement within the next 48 hours, please contact us at All press inquiries should be sent to

Memoryze + VxClass vs Zeus


I have previously blogged about VxClass and our algorithms for automated generation of byte signatures here and here and here. I have also blogged about private signatures beforehand, a concept that I think has great relevance for defense against (and response to) targeted attacks. One point I left open in the previous blog post was the following question:

How do I actually use the byte signatures in a real-world scenario?

In this post, I will answer that question: We will use the latest version of Mandiant’s Memoryze and VxClass and walk through the entire process of memory acquisition, malware classification, signature generation and signature deployment. In detail, our steps are going to be the following:

  1. We will pre-populate a VxClass system with a good quantity of Zeus samples
  2. We will examine the clusters generated from this
  3. We infect a previously clean XP with a new Zeus sample
  4. We identify the suspicious memory sections that were created by Zeus using Memoryze and AuditViewer, two great (and free!) tools that Mandiant has released
  5. We acquire the injected memory sections from the infected system and submit them to VxClass
  6. VxClass recognizes the similarity to previously submitted Zeus samples and generates a byte signature to detect the entire cluster of both old and new Zeus variants. This signature is unique to us, and can be used to detect infections by this malware on any Windows machine.
  7. The signature is then fed into Memoryze to identify the infection on other machines

Wow, that’s quite a list. So where do we start? We start out by examining a small set of about 180 malware samples that we pre-populated a fresh VxClass with. These samples were labeled “Zeus” or “Zbot” by some anti-virus software, so we assume they are Zeus variants.VxClass has generated family trees, and assigned most of the files to these trees.

An overview of the family trees in the system

In the next two pictures, it will become evident what the “similarity score” between two samples means: It indicates how much overlap there is between the code of the two executables under consideration. To further illustrate the issue, we drew some Venn diagrams illustrating the overlap between the highlighted samples on the right hand side.

Two pieces of malware and how they overlap

The next screenshot shows where in the family tree these two items are located (you might have to load the non-thumbnail version to actually see this):

But enough of this. As a next step, I infect a vanilla XP machine with a new random sample that was labelled Zeus — one that VxClass has never seen before. After having done this, I use Memoryze (and AuditViewer, a nice GUI for it) to identify the memory sections that have been injected into various processes. To do this, you have to click through a few dialogs that ask you to configure the following things:

  1. The location of Memoryze
  2. The output path for the data
  3. Whether you wish to work from live memory (yes)

In the first step, we just want to identify those processes that Zeus injected itself into – we will acquire the injected memory separately. We hence check “Process Enumeration” and leave the “Acquisition” fields unchecked. Finally, we enable “Memory Sections” and “Detect Injected DLLs” to be enumerated for each process. AuditViewer now launches Memoryze, and after a brief waiting period we can inspect the results. AuditViewer is nice enough to highlight the processes that contain suspicious injected DLLs in red. Furthermore, we can immediately spot the problematic section of memory, because AuditViewer has highlighted it, too.

AuditViewer highlights the suspicious memory regions

In the next step, we simply upload the .VAD file of this memory section to our VxClass box and look whether this is in any way similar to the code already in the box. And no surprise: It is quite similar to a number of other executables in the database:

The .VAD file was clustered close to other executables

In the next step, we will want to generate a traditional byte signature for all executables in the relevant cluster. This is pretty easy — highlight them, assign a tag to them, and then right-click “create signature” (screenshot below).

Making VxClass generate a signature automatically

A few seconds later, the signature is ready and can be downloaded (in ClamAV format) from the “Signatures” tab in the VxClass web UI. The generated signature is the following:


This is clearly much longer than what would be strictly required – we usually generate much longer signatures than the bare minimum in order to minimize the risk of false positives. The cool thing about the signature is that it is private — e.g. no other user of VxClass would get the same signature (to be precise: the probability of another user getting the same signature is astronomically small). This means that unless you share this signature (like I did above), it remains as a “secret weapon” in your arsenal: The malware authors do not know what signature you are going to detect them with, so they can’t intentionally “break” this signature.

Now, an important question remains unanswered:

How does one deploy such a signature?

We have two options: We can use AuditViewer to configure Memoryze again, or we can simply run Memoryze from the command line with the appropriate configuration file to scan through the physical memory of a machine. In order to use AuditViewer, simply configure it as usual: Provide it with the location of Memoryze and tell it to analyze live memory. After you have done so, you mark the “enumerate processes” checkbox, and finally, you tell AuditViewer about the byte pattern you wish to search for:

Configuring AuditViewer to scan for a signature

AuditViewer will then launch Memoryze in the background which will scan through all processes and identify those that contain the pattern in question.

AuditViewer saves the current configuration for Memoryze in a file called out.xml in your Memoryze directory. This means you can simply make a copy of out.xml in your Memoryze directory and re-use it on other machines without having to re-run AuditViewer. Simply install Memoryze on a machine and then launch “Memoryze -script out.xml -o <outputdir>“.

You now have a new way of detecting variants of the malware that was used to attack you. But best of all: This method is secret – the signature isn’t shared with the wider world – and the attacker therefore has a much harder time immunizing his attack tools prior to the next attack.

To summarize: Combining VxClass with Memoryze and AuditViewer makes the acquisition and correlation of malicious code easy – but best of all, these tools also provide a quick and convenient way to automatically generate high-quality detection mechanisms that are kept secret from the attackers.

Recovering UML diagrams from binaries using RTTI – Inheritance as partially ordered sets


Wow, it’s been a while since we last blogged. Ok, time to kick off 2011 🙂

A lot of excellent stuff has been written about Microsoft’s RTTI format — from the ISS presentations a few years back to igorsk’s excellent OpenRCE articles. In the meantime, RTTI information has “spread” in real-world binaries as most projects are now built on compilers that default-enable RTTI information. This means that for vulnerability development, it is rare to not have RTTI information nowadays; most C++ applications come with full RTTI info.

So what does this mean for the reverse engineer ? Simply speaking, a lot — the above-mentioned articles already describe how a lot of information about the inheritance hierarchy can be parsed out of the binary structures generated by Visual C++ — and there are some pretty generic scripts to do so, too.

This blog article is about a slightly different question:

How can we recover full UML-style inheritance diagrams from executables by parsing the RTTI information ?

To answer the question, let’s review what the Visual C++ RTTI information provides us with:

  1. The ability to locate all vftables for classes in the executable
  2. The mangled names of all classes in the executable
  3. For each class, the list of classes that this class can be legitimately upcast to (e.g. the set of classes “above” this class in the inheritance diagram)
  4. The offsets of the vftables in the relevant classes

This is a good amount of information. Of specific interest is (3) — the list of classes that are “above” the class in question in the inheritance diagram. Coming from a mathy/CSy background, it becomes obvious quickly that (3) gives us a “partial order”: For two given classes A and B, either A ≤ B holds (e.g. A is inherits from B), or the two classes are incomparable (e.g. they are not part of the same inheritance hierarchy). This relationship is transitive (if A inherits from B, and B inherits from C, A also inherits from C) and antisymmetric (if A inherits from B and B inherits from A, A = B). This means that we are talking about a partially ordered set (POSet)

Now, why is this useful ? Aside from the amusing notion that “oh, hey, inheritance relationships are POSets“, it also provides us with a simple and clear path to generate readable and pretty diagrams: We simply calculate the inheritance relation from the binary and then create a Hasse Diagram from it — in essence by removing all transitive edges. The result of this is a pretty graph of all classes in an executable, their names, and their inheritance hierarchy. It’s almost like generating documentation from code 🙂

Anyhow, below are the results of the example run on AcroForm.API, the forms plugin to Acrobat Reader:

The full inheritance diagram of all classes in AcroForm


A more interactive (and fully zoomable) version of this diagram can also be viewed by clicking here.

For those of you that would like to generate their own diagrams, you will need the following tools:

Enjoy ! 🙂

Challenging conventional wisdom on AV signatures (Part 2 of 2)


A while ago I posted a blog entry called “challenging conventional wisdom on AV signatures (Part 1 of 2)”. There, I argued that the fundamental problem with traditional AV byte signatures is the of lack of information asymmetry: The defender and the attacker both have access to the same information, and the attacker can run a potentially infinite number of test-runs to make sure he can reliably bypass all of defensive measures the defender has taken.

The important thing to take away from that blog post is that the problem with AV signatures is not inherent to “signatures” – it is a matter of information symmetry.

Now, how can one change this situation? Is there a clever way to make traditional byte signatures useful again? Can we somehow introduce information asymmetry in a productive manner?

To investigate this, we have to remember another blog post where I described some of our results on generating “smart” signatures (this appears to be AV lingo for signatures that are not checksum-based, but which consist of bytes and wildcards). The summary of this blog post is more or less: “With the algorithms underpinning VxClass, we can not only automatically cluster malicious software into groups, we can also generate signatures for each group automatically. And one signature will match the entire group.”

There was one small bit of information missing in that post that will make this post interesting: We can usually generate dozens, if not hundreds, of different signatures for the same cluster of malware. These signatures match, by construction, on all samples of a particular cluster, but they have nothing in common – they match on different bits of the code.

Where does this leave us? Well, it leaves us with a pretty cool system that we call VxClass for Financials (although it is possible to substitute ‘Financials’ with other large verticals that are often victims of targeted attacks). The system works as follows:

  1. Different financial institutions each get a user account on a centralized VxClass server
  2. Users upload malicious software that they have recovered (using tools such as Memoryze) from their own systems
  3. Users are anonymous by default
  4. Users can see how malware they upload clusters; they can also see how similar their malware is to malware other users uploaded
  5. Users can only download their own malware, not the malware of other users
  6. For each cluster, users can generate a personalized detection signature that no other user will ever see

Why is this cool ? Well, for one thing, every user profits from uploading to the system — the more samples are present in one cluster, the better the predictive power of the signatures. At the same time, users do not have to share any confidential information with each other — they are encouraged to, but they do not have to. Finally, even if some users of the system are sloppy and leak their signatures to the attacker, they only endanger themselves – everybody else has their own signatures, and will thus not be affected by this signature leak. This is important – normally, when I share methods of detection with others, I risk losing them. Not here.

We are starting an evaluation/beta program of the system in the next 1-2 weeks — at the moment, targeted at the financial sector. If you happen to find this interesting, are working for a financial institution and want to participate in our test drive, please contact us at !

Las Vegas & the zynamics team


Along with RECon, the single most important date in the reverse engineering / security research community is the annual Blackhat/DefCon event in Las Vegas. Most of our industry is there in one form or the other, and aside from the conference talks, parties and award ceremonies, there’s also a good amount of technical discussions (in bars or elsewhere) that takes place.

This year, a good number of researchers/developers from the zynamics Team will be present in Las Vegas — alphabetically, the list is:

  1. Ero Carrera
  2. Thomas Dullien/Halvar Flake
  3. Vincenzo Iozzo
  4. Tim Kornau

So, if you wish meet any of the team to discuss reverse engineering, our technologies, our research, or the performance of the Spanish or German football team at the last world cup, do not hesitate to drop an email to — Vegas is always chaotic, and scheduling a meeting will minimize stress for everyone that is involved.

Specifically, the following topics are specifically worth meeting over:

  1. Chat with Ero over our unpacking engine (just presented at RECon) — and how it fits into the larger scheme of things (e.g. VxClass)
  2. Meet with Tim or Vincenzo to discuss automated gadget-finding for ROP, or anything involving the ARM/REIL translations
  3. Meet with Thomas/Halvar to discuss VxClass, automated malware clustering, automated generation of “smart” malware signatures etc.

Aside from this, if you are interested in …

  • … boosting your reverse engineering performance by porting symbols from FOSS software into your closed-source disassemblies (BinDiff)
  • … becoming faster at finding bugs by leveraging differential debugging, the REIL intermediate language and static analysis frameworks (BinNavi)
  • … enhancing team-based reverse engineering by pooling accumulated knowledge and sharing information (BinCrowd)
  • … automatically correlating and clustering malware and forensically obtained memory dumps, and automatically deriving detection mechanisms (VxClass)
  • … analyzing malicious PDF files including the embedded JavaScript code (PDF Dissector)

then do not hesitate to drop us mail — we’ll gladly show/explain what our tools/technologies can do.

See you there !

At zynamics, we like good offense …


… and therefore we are happy to have sponsored Shawn Dean so he could go to the Wajutsu Keishukai Grappling Tournament in Tokyo – which HE WON.  We are happy to have had to opportunity to sponsor him and even happier to see him succeed.

Also, it is great to see BinNavi-embroidered shorts on the winner 😛

Watch it yourself here:

Shawn Dean receives honors

  • Shawn Dean in action

    Challenging conventional wisdom on AV signatures (Part 1 of 2)


    We have all been taught (and intuitively felt) that traditional antivirus signatures are, for the most part, a waste of time. I think I have myself argued something similar repeatedly. One could say that “byte signatures don’t work” is accepted conventional wisdom in the security industry. Especially in the light of the recent (and much-publicized) Aurora-attacks, this conventional wisdom appears to ring truer than ever.

    One thing though that I have personally always liked about the security industry was the positive attitude towards challenging conventional wisdom — re-examining the assumptions underlying this wisdom. In this post (and the upcoming sequel), I will do just that: I will examine the reasons why everyone is convinced that traditional byte signatures do not work and ask questions about the assumptions that lead us to this conclusion.

    So. Why do we think that traditional antivirus signatures are a waste of time ?

    Let’s first recapitulate what the usual cycle in a targeted attack consists of:

    1. The attacker writes or obtains a backdoor component
    2. The attacker writes or obtains an exploit
    3. The attacker tests both exploit and backdoor against available AV tools, making sure that both are not detected
    4. The attacker compromises the victim and starts exfiltrating data
    5. The defender notices the attack, passes the backdoor to the AV company, and cleans up his network
    6. The AV company generates a signature and provides it to both the attacker and defender
    7. Goto (2)

    This entire cycle can be clarified with a few pictures:

    Let’s look at this diagram again. What properties of “byte signatures” does the attacker exploit in immunizing his software ? Well … none, really, except the fact that he fully knows about them before launching his attack. There is no information asymmetry: The attacker has almost the exact same information that the defender has. Through this, he is provided with a virtually limitless number of trial runs of his attack, and he can adapt his attack arbitrarily, over long time periods, until he is reasonably certain that it will be both successful and undetected.

    The implication of this is that the underlying problem is not a feature of byte signatures, but rather a generic problem inherent in all security systems that provide identical data to both attackers and defenders and that have no information asymmetry at all.

    In the next post, I will examine two approaches how this problem could be addressed to give the defender an advantage.

    Ralf-Philipp Weinmann & Vincenzo Iozzo own the iPhone at PWN2OWN


    Hey all,

    this is just a quick announcement that Ralf-Philipp Weinmann (a postdoctoral researcher at the University of Luxembourg) and Vincenzo Iozzo (a researcher at zynamics :-)) owned the iPhone at PWN2OWN today.

    A bug in Safari was exploited that extracted the SMS database from the phone and uploaded it to a server.

    Vincenzo will write more about the payload construction process once the dust settles — fittingly, the payload used chained return-into-libc (“return oriented programming”) on ARM to execute in spite of code signing. As far as we know, this is the first public demonstration of chainged return-into-libc on thre ARM platform.
    I am happy and proud to be able to work with great people (Ralf happens to be a BinNavi/BinDiff user, and Vincenzo is “our youngest” employee).  Now we’ll celebrate for a bit and then prepare tomorrow’s talk.

    Here’s a press release and ZDI’s blog post about pwn2own.



    VxClass, automated signature generation, RSA 2010


    Everybody is convening in San Francisco next week for RSA2010 it appears — the big annual cocktail & business card exchange event. If you are interested in any of our technology (automated malware classification, automated signature generation, BinDiff, BinNavi) and would like to meet up with me, please contact 🙂