In the first and second part of this series I have given an overview of our Reverse Engineering Intermediate Language (REIL) and talked about the purpose and structure of the individual REIL instructions. This third part is about REIL code generation.
REIL is included in BinNavi to help users write their own code analysis algorithms, often based on abstract interpretation. However, obviously our users are not really interested in analyzing REIL code. What they really want is to analyze the native assembly code of their target binary file. The REIL language can nevertheless be used by all users of BinNavi. The translation from native assembly code to REIL code is just as simple and transparent as porting the results of REIL analysis back to the original code.
BinNavi 3.0 ships with REIL translators for ARM, MIPS, PowerPC, and x86 code. Code from any of these native assembly languages can be translated to REIL code with the same effect on the program state as the original code. Users can then analyze the effects of the much simpler REIL code and port the results of the analysis back to the original native assembly code they are really interested in.
The translation process from native assembly code to REIL code is very simple. Given a list of native assembly instructions, each individual instruction is translated to REIL code. In a second pass, the generated linear listing of REIL code is taken and converted into a control flow graph. This control flow graph is then passed to the user and he can use it in his code analysis algorithm.
When writing the translators, we made sure that all native assembly instructions could be translated to REIL code without needing any information about the instructions before or after the current instruction. This allowed us to keep the individual instruction translators stateless.
Generally multiple REIL instructions are emitted by the translators for any native input instruction. Consequently it is not possible to keep a 1:1 mapping between the addresses of the original assembly instructions and the addresses of their corresponding REIL instructions. The solution we found was to multiply every native address by 0x100 to calculate the base address of all REIL instructions that belong to a native instruction. The lowest byte of such REIL addresses can then be filled by an index that specifies the relative position of a single REIL instruction inside the sequence of REIL instructions generated from the same native input instruction. What I mean here is that if you have a native instruction at address 0x08 the REIL translators will generate REIL instructions for it at addresses 0x800, 0x801, 0x802, …
This way of translating native instruction addresses to REIL instruction addresses limits us to at most 256 REIL instructions for each native instruction. In practice, even getting to 30 REIL instructions for one native instruction is rare although we have a few outliers that are translated to up to 70 REIL instructions.
There is another big advantage to this address translation method. For any given REIL instruction you can immediately determine its native source instruction. There is no need to consider any additional context around the REIL instruction. This is really important if you want to port the results of a REIL analysis algorithm back to the original code. If you have determined a result for a REIL instruction you can just divide its REIL address to 0x100 and you know the original instruction for which the result holds too.
That’s it for now. If you have any questions about the translation process or REIL in general please leave a comment.