In the first and second part of this series I have given an overview of our Reverse Engineering Intermediate Language (REIL) and talked about the purpose and structure of the individual REIL instructions. This third part is about REIL code generation.
REIL is included in BinNavi to help users write their own code analysis algorithms, often based on abstract interpretation. However, obviously our users are not really interested in analyzing REIL code. What they really want is to analyze the native assembly code of their target binary file. The REIL language can nevertheless be used by all users of BinNavi. The translation from native assembly code to REIL code is just as simple and transparent as porting the results of REIL analysis back to the original code.
BinNavi 3.0 ships with REIL translators for ARM, MIPS, PowerPC, and x86 code. Code from any of these native assembly languages can be translated to REIL code with the same effect on the program state as the original code. Users can then analyze the effects of the much simpler REIL code and port the results of the analysis back to the original native assembly code they are really interested in.
The translation process from native assembly code to REIL code is very simple. Given a list of native assembly instructions, each individual instruction is translated to REIL code. In a second pass, the generated linear listing of REIL code is taken and converted into a control flow graph. This control flow graph is then passed to the user and he can use it in his code analysis algorithm.
When writing the translators, we made sure that all native assembly instructions could be translated to REIL code without needing any information about the instructions before or after the current instruction. This allowed us to keep the individual instruction translators stateless.
Generally multiple REIL instructions are emitted by the translators for any native input instruction. Consequently it is not possible to keep a 1:1 mapping between the addresses of the original assembly instructions and the addresses of their corresponding REIL instructions. The solution we found was to multiply every native address by 0x100 to calculate the base address of all REIL instructions that belong to a native instruction. The lowest byte of such REIL addresses can then be filled by an index that specifies the relative position of a single REIL instruction inside the sequence of REIL instructions generated from the same native input instruction. What I mean here is that if you have a native instruction at address 0x08 the REIL translators will generate REIL instructions for it at addresses 0x800, 0x801, 0x802, …
This way of translating native instruction addresses to REIL instruction addresses limits us to at most 256 REIL instructions for each native instruction. In practice, even getting to 30 REIL instructions for one native instruction is rare although we have a few outliers that are translated to up to 70 REIL instructions.
There is another big advantage to this address translation method. For any given REIL instruction you can immediately determine its native source instruction. There is no need to consider any additional context around the REIL instruction. This is really important if you want to port the results of a REIL analysis algorithm back to the original code. If you have determined a result for a REIL instruction you can just divide its REIL address to 0x100 and you know the original instruction for which the result holds too.
That’s it for now. If you have any questions about the translation process or REIL in general please leave a comment.
Is there any chance that you will publish your REIL specification & implementation plus the x86(-64 ?) to REIL translator freely at some point? I’d not have to reinvent the wheel with a custom IL and you could benefit from additional analysis tools to go with BinNavi… 🙂
Hi Georg,
we generally give the REIL translator implementation free of charge to people with interesting non-commercial ideas.
There is a big problem though. The REIL translators are algorithms and like all algorithms they require input in a specific format. Getting data from an external source like IDA Pro into this specific format is in itself a huge task (I’m talking about weeks of work here) and nearly everybody gives up on the way.
As far as I know only one person so far has managed to get their external disassembly source to work with the REIL translators and it took them months.
If you are still interested, write me an email.
Hello Sebastian,
Am a graduate student working on malware detection.
I am working on semantic aspect of PE files, looking for metamorphism.I created a large pool of samples, from vxheavens, and offensivecomputing. disassembled them, as well as preprossed them to be given as input to generate the control flow graph.
Can you please share the code with me ?. this would help a lot to move ahead
Sebastian,
I am researching automated vulnerability discovery and subsequent patching and exploitation. The REIL representation lends itself towards this purpose, and I would like to use the REIL translator. Is it available? If so, how can I get a copy?
Otherwise, is there a reverse-translator developed yet? If so, is it available?
Thanks,
Mark
Hi Mark,
The REIL translators are available via BinNavi. Also a different form of REIL is available as RREIL from here https://code.google.com/p/gdsl-toolkit/wiki/RReil. There is currently no other way to obtain a REIL translator. There is no reverse translator that I know of.
Is there an REIL to x86 back algorithm? I thought to REIL is an one-way translation.
There is no compiler for REIL to x86. So far it’s a one-way translation. We implemented a very, very limited REIL -> x86 compiler for a deobfuscation algorithm we toyed around with but it is not really usable in practice.
[…] REIL language» (part 1, part 2, part 3, part […]