On our way back home from Black Hat Europe in Barcelona, Thomas and I were brainstorming about the most important changes to the field of binary code reverse engineering in the last 10 years. What has changed since then? What made the biggest impact? Remember: Back in the dark days of 2000, W32Dasm and Turbo Debugger were considered good reverse engineering tools. If you had a self-written tracer that logged the execution of conditional jumps you were basically a king.
Anyway, we came up with several trends and technologies we believe have changed the job of reverse engineers tremendously since 2000. Here they are:
Visual flow graphs for assembly code
First introduced in IDA Pro 4.17 (June 2001), the ability to view disassembled assembly code in graph form made the job of reverse engineers much easier. In essence, using visual flow graphs during reverse engineering raises the level of abstraction and understanding of code while at the same time lowering the required time and effort one has to invest. Before we had graphs we had to reconstruct control-flow structures like loops and if-else statements from linearly listed assembly instructions. With visual flow graphs we can just look at the graph and understand the control flow pretty much immediately.
In the following years other tools (such as BinNavi) were built around the idea of interacting with flowgraphs. Shortly thereafter, the graph engine of IDA Pro was improved (especially in IDA Pro 5.0, March 2006) to provide interactive graphing out of the box.
Python as a scripting language
Back in 2000, most reverse engineering tools were primitive and barely extensible. For disassemblers your best bet was a clumsy IDC implementation in IDA Pro 4. For debuggers the situation looked even bleaker. This all changed with the growing popularity of the scripting language Python and SWIG, a technology which allows programs to easily add a Python interpreter and expose a Python-based API. The first major step forward I can remember was the creation of the IDAPython plugin for IDA Pro which added a way to access the IDA API from Python (Gergely Erdelyi, 2004). Later we had tools like Pedram Amini’s PyDbg or Ero Carrera’s pefile that helped popularize the Python language in reverse engineering.
Today, Python is the de-facto scripting language of reverse engineering and many tools from IDA Pro to ImmunityDebugger or BinNavi support Python scripting.
Even though the technology is not brand-new (the first publications describing ‘Dynamo’ go back to 2000), the widespread use of dynamic instrumentation tools like DynamoRIO and Pin for reverse engineering certainly is. Using these frameworks you can build very powerful dynamic analysis tools that allow the monitoring and manipulation of instruction streams in a very transparent and highly efficient way. If you have never used either of these tools, you can imagine them like a way to efficiently receive a callback to a C/C++ program after every instruction. Using these, you can directly control every aspect of the targeted program, while incurring small overhead.
If you are looking for a new reverse engineering tool to do some research with, dynamic instrumentation might be for you: Working on actual program traces removes a lot of complication in comparison to the static case, and the many different productive uses of dynamic instrumentation are still far from exhausted. While relatively fresh and untapped, dynamic instrumentation tools are definitely a topic people talk about at IT security conferences and elsewhere.
Many years ago, some smart people had a brilliant idea: If you compare an unpatched version of a file to a patched version of the same file, you can easily find what code was changed by the patch and use this information to quickly find vulnerabilities that were patched by the patch. Soon it became evident that new tools were needed that make the process of comparing two versions of the same file as quick and easy as possible. Our own BinDiff tool is maybe the most popular diffing engine for binary code today. However, the idea of comparing files is so popular that a number of free competitors have sprung up over the years. In general, these tools all work in the same way: Once the two input files are disassembled, the functions in file A are matched to the functions in file B and local changes to the matched functions are found and shown to the user.
BinDiff-style tools are now part of the standard toolbox of many reverse engineers, from vulnerability researchers to malware analysts and there is hardly another technology that rose as spectacularly as this one since 2000.
The end of SoftICE
Back in the days there was just one debugger everybody used for reverse engineering: SoftICE. SoftICE was a wonderful debugger originally written by a company called NuMega from New Hampshire. It was a debugger that allowed you to debug user-land programs as well as kernel-land programs on your localhost machine without the need for any complicated setup. Later, NuMega was bought by Compuware and SoftICE was discontinued in April 2006.
Of course, newer debuggers have replaced SoftICE today. Microsoft’s own WinDbg, while not nearly as pretty as SoftICE, is the new powerful and popular debugger on the block.
The arrival of the Hex-Rays decompiler
Back in 2000, decompilers sucked. Today, there is Hex-Rays. Back in 2007 the team behind IDA Pro released the first decompiler I am aware of that is actually useful. Since then they have continued to improve the decompiler and they are already showcasing support for ARM decompilation.
While not many people seem to use Hex-Rays yet, this product is definitely one to keep an eye on.
Collaborative Reverse Engineering
Back in 2000, collaborative reverse engineering was unheard of as it was really difficult to exchange reverse engineered information between two databases created by the same program, let alone between different programs. In recent years the situation changed a bit, probably mostly out of necessity. Software today is much more complex than it was ten years ago and very often teams of reverse engineers have to collaborate on the same project.
While still in their infancy, collaborative reverse engineering tools are here to stay and will probably become even more popular in the future. Reverse engineers will pick tools like Chris Eagle’s CollabREate for IDA Pro or our own BinCrowd to share their results with friends and colleagues.
Another trend of the last few years is that major universities research topics related to binary code reverse engineering. Among others, there are the University of Berkeley and Carnegie Mellon University which have done really impressive work in the last few years. At the same time, reverse engineers in the industry have begun to take note of academic approaches to reverse engineering. While academic approaches to reverse engineering are not yet in common use in the industry, we know many people and companies that are beginning to look into more formalized ways to reverse engineering. The popularity of the Reverse Engineering Reddit, maybe the primary resource for formalized reverse engineering on the internet, speaks volumes.
So, that’s our opinion. Maybe your opinion is different. Do you disagree with any of those advances or did we miss anything significant? Can you think of any technology that was supposed to be the future but then bombed spectacularly in practice? Let us know.