Ten years of innovation in reverse engineering

On our way back home from Black Hat Europe in Barcelona, Thomas and I were brainstorming about the most important changes to the field of binary code reverse engineering in the last 10 years. What has changed since then? What made the biggest impact? Remember: Back in the dark days of 2000, W32Dasm and Turbo Debugger were considered good reverse engineering tools. If you had a self-written tracer that logged the execution of conditional jumps you were basically a king.

Anyway, we came up with several trends and technologies we believe have changed the job of reverse engineers tremendously since 2000. Here they are:

Visual flow graphs for assembly code

First introduced in IDA Pro 4.17 (June 2001), the ability to view disassembled assembly code in graph form made the job of reverse engineers much easier. In essence, using visual flow graphs during reverse engineering raises the level of abstraction and understanding of code while at the same time lowering the required time and effort one has to invest. Before we had graphs we had to reconstruct control-flow structures like loops and if-else statements from linearly listed assembly instructions. With visual flow graphs we can just look at the graph and understand the control flow pretty much immediately.

In the following years other tools (such as BinNavi) were built around the idea of interacting with flowgraphs. Shortly thereafter, the graph engine of IDA Pro was improved (especially in IDA Pro 5.0, March 2006) to provide interactive graphing out of the box.

Python as a scripting language

Back in 2000, most reverse engineering tools were primitive and barely extensible. For disassemblers your best bet was a clumsy IDC implementation in IDA Pro 4. For debuggers the situation looked even bleaker. This all changed with the growing popularity of the scripting language Python and SWIG, a technology which allows programs to easily add a Python interpreter and expose a Python-based API. The first major step forward I can remember was the creation of the IDAPython plugin for IDA Pro which added a way to access the IDA API from Python (Gergely Erdelyi, 2004). Later we had tools like Pedram Amini’s PyDbg or Ero Carrera’s pefile that helped popularize the Python language in reverse engineering.

Today, Python is the de-facto scripting language of reverse engineering and many tools from IDA Pro to ImmunityDebugger or BinNavi support Python scripting.

Dynamic Instrumentation

Even though the technology is not brand-new (the first publications describing ‘Dynamo’ go back to 2000), the widespread use of dynamic instrumentation tools like DynamoRIO and Pin for reverse engineering certainly is. Using these frameworks you can build very powerful dynamic analysis tools that allow the monitoring and manipulation of instruction streams in a very transparent and highly efficient way. If you have never used either of these tools, you can imagine them like a way to efficiently receive a callback to a C/C++ program after every instruction. Using these, you can directly control every aspect of the targeted program, while incurring small overhead.

If you are looking for a new reverse engineering tool to do some research with, dynamic instrumentation might be for you: Working on actual program traces removes a lot of complication in comparison to the static case, and the many different productive uses of dynamic instrumentation are still far from exhausted. While relatively fresh and untapped, dynamic instrumentation tools are definitely a topic people talk about at IT security conferences and elsewhere.

BinDiff-ing

Many years ago, some smart people had a brilliant idea: If you compare an unpatched version of a file to a patched version of the same file, you can easily find what code was changed by the patch and use this information to quickly find vulnerabilities that were patched by the patch. Soon it became evident that new tools were needed that make the process of comparing two versions of the same file as quick and easy as possible. Our own BinDiff tool is maybe the most popular diffing engine for binary code today. However, the idea of comparing files is so popular that a number of free competitors have sprung up over the years. In general, these tools all work in the same way: Once the two input files are disassembled, the functions in file A are matched to the functions in file B and local changes to the matched functions are found and shown to the user.

BinDiff-style tools are now part of the standard toolbox of many reverse engineers, from vulnerability researchers to malware analysts and there is hardly another technology that rose as spectacularly as this one since 2000.

The end of SoftICE

Back in the days there was just one debugger everybody used for reverse engineering: SoftICE. SoftICE was a wonderful debugger originally written by a company called NuMega from New Hampshire. It was a debugger that allowed you to debug user-land programs as well as kernel-land programs on your blog.zynamics.com machine without the need for any complicated setup. Later, NuMega was bought by Compuware and SoftICE was discontinued in April 2006.

Of course, newer debuggers have replaced SoftICE today. Microsoft’s own WinDbg, while not nearly as pretty as SoftICE, is the new powerful and popular debugger on the block.

The arrival of the Hex-Rays decompiler

Back in 2000, decompilers sucked. Today, there is Hex-Rays. Back in 2007 the team behind IDA Pro released the first decompiler I am aware of that is actually useful. Since then they have continued to improve the decompiler and they are already showcasing support for ARM decompilation.

While not many people seem to use Hex-Rays yet, this product is definitely one to keep an eye on.

Collaborative Reverse Engineering

Back in 2000, collaborative reverse engineering was unheard of as it was really difficult to exchange reverse engineered information between two databases created by the same program, let alone between different programs. In recent years the situation changed a bit, probably mostly out of necessity. Software today is much more complex than it was ten years ago and very often teams of reverse engineers have to collaborate on the same project.

While still in their infancy, collaborative reverse engineering tools are here to stay and will probably become even more popular in the future. Reverse engineers will pick tools like Chris Eagle’s CollabREate for IDA Pro or our own BinCrowd to share their results with friends and colleagues.

Academic Approaches

Another trend of the last few years is that major universities research topics related to binary code reverse engineering. Among others, there are the University of Berkeley and Carnegie Mellon University which have done really impressive work in the last few years. At the same time, reverse engineers in the industry have begun to take note of academic approaches to reverse engineering. While academic approaches to reverse engineering are not yet in common use in the industry, we know many people and companies that are beginning to look into more formalized ways to reverse engineering. The popularity of the Reverse Engineering Reddit, maybe the primary resource for formalized reverse engineering on the internet, speaks volumes.

So, that’s our opinion. Maybe your opinion is different. Do you disagree with any of those advances or did we miss anything significant? Can you think of any technology that was supposed to be the future but then bombed spectacularly in practice? Let us know. 🙂

10 Responses to “Ten years of innovation in reverse engineering”

  1. Marc Ruef says:

    Hello,

    Very nice summary! Keep up the great work on your blog!

    Regards,

    Marc

  2. arebc says:

    I feel in some way that OpenRCE should be on that list. People like Ero Carrera, Pedram and the other admins that created and supported OpenRCE helped build a community. The work that the admins presented was inspiring for many individuals that were new in the field. I still use PeFile, Pyemu and many other tools/tricks that I learned about from OpenRCE.

    Awesome post!

    • Sebastian Porst says:

      I definitely see your point but I am not sure whether OpenRCE is exceptional enough. There were good communities 10 years ago too and OpenRCE really lost steam after a brief phase of initial enthusiasm.

  3. RD says:

    How about the trend of using emulated or virtualized environments in RE (bochs, qemu, virtualbox..)?

    • Sebastian Porst says:

      That’s a really good one. At zynamics we use different emulators for simple things like shellcode emulation in IDA to more advanced stuff like automated unpacking.

  4. Rolf Rolles says:

    I still use SoftICE.

  5. Cody Pierce says:

    Great list, I feel one major “trend” was left out. Being able to reverse engineer as a profession! It allows people to develop some of these tools and ideas while paying the bills 🙂

    • Sebastian Porst says:

      That’s a very good point, Cody. While writing the blog post I was talking to someone on AIM who brought it up too, in response to my complaints that software development seems to advance much faster than RE. But not only is the RE community obviously much smaller, it’s also been a viable profession for a few years so it’s not a big surprise.

  6. Python is the de-facto scripting language of reverse engineering and many tools from IDA Pro to ImmunityDebugger or BinNavi support Python scripting, totally good

  7. Anonymous says:

    What about OllyDbg?!