Posts Tagged ‘reverse engineering’

ida2sql: exporting IDA databases to MySQL

Tuesday, June 29th, 2010

Today we are finally making it easier to get your hands on ida2sql, our set of scripts to export information contained in an IDA database into MySQL.

As a short recap, ida2sql is a set of IDAPython scripts to export most of the information contained in an IDB into a MySQL database. It has existed and evolved already for a few years and has been the main connection between IDA and BinNavi for the most of the life of the latter.

The last development efforts have been geared towards making the schema a bit more friendly (see below) and making it work in a fair range of IDA (5.4 to 5.7) and IDAPython versions (including some that shipped with IDA which had minor problems, ida2sql will automatically work around those issues). The script runs under Windows, Linux and OSX.

ida2sql is comprised of a ZIP archive containing the bulk of the scripts that simply needs to be copied to IDA’s plugin folder. No need to extract its contents. A second script, ida2sql.py, needs to be run from within IDA when we are ready to export data. You can keep it in any folder, it should be able to automatically locate the ZIP file within the plugins folder. You can download here a ready built package containing the ZIP file, the main script, a README and an example configuration file.

When the main Python file is executed in IDA and if all the dependencies are successfully imported the user will be presented with a set of dialogs to enter the database information. If the database is empty there will also be a message informing that the basic set of tables is about to be created at that point. Once all configuration steps have been completed the script will start processing the database, gathering data and finally inserting it all into the database.

The configuration process can be simplified by creating a config file ida2sql.cfg in IDA’s main directory (or by pointing to it the IDA2SQLCFG environment variable). If ida2sql can find that file it will not ask for any of the configuration options and go straight into the exporting.

Automation

ida2sql has a batch mode that comes handy when you need to export a collection of IDBs into the database. To run ida2sql in batch mode it’s enough to set the corresponding option in the configuration file.

mode: batch

An operation mode of “batch” or “auto” indicates that no questions or other kind of interaction should be requested from the user. (Beware though that IDA might still show dialogs like those reminding of a license or free-updates period about to expire. In those cases run IDA through the GUI and select to never show again those reminders). The batch mode is specially useful when running ida2sql from the command line, for instance:

idag.exe -A -OIDAPython:ida2sql.py database.idb|filename.exe

Requirements

  • mysql-python
  • A relatively recent IDA (tested with 5.4, 5.5, 5.6 and the latest beta of 5.7)
  • IDAPython. Chances are that you already have it if you are running a recent IDA version
  • A MySQL database. It does not need to reside on the same host

The schema

A frequent criticism to the schema design has always been the use of a set of tables per each module. People have asked why not use instead using a common table-set for all modules in the database. While we considered this approach in the original design, we opted for using a set of tables per module. We are storing operand trees in an optimized way aiming at reducing redundant information by keeping a single copy of all the common components of the operand’s expression tree. Such feature would be extremely difficult to support were we to use a different a different schema. Additionally tables can easily grow to many tens of millions of rows for large modules. Exporting hundreds of large modules could lead to real performance problems.

The table “modules” keeps track of all IDBs that have been exported into the database and a set of all the other tables exists for each module.

BinNavi DB Version 2

BinNavi DB Version 2

The following, rather massive, SQL statement shows how to retrieve a basic instruction dump for all exported code from an IDB. (beware of the placeholder “_?_”)

[sourcecode language=”sql”]
SELECT
HEX( functions.address ) AS functionAddress,
HEX( basicBlocks.address ) AS basicBlockAddress,
HEX( instructions.address ) AS instructionAddress,
mnemonic, operands.position,
expressionNodes.id, parent_id,
expressionNodes.position, symbol, HEX( immediate )
FROM
ex_?_functions AS functions
INNER JOIN ex_?_basic_blocks AS basicBlocks ON
basicBlocks.parent_function = functions.address
INNER JOIN ex_?_instructions AS instructions ON
basicBlocks.id = instructions.basic_block_id
INNER JOIN ex_?_operands AS operands ON
operands.address = instructions.address
INNER JOIN ex_?_expression_tree_nodes AS operandExpressions ON
operandExpressions.expression_tree_id = operands.expression_tree_id
INNER JOIN ex_?_expression_nodes AS expressionNodes ON
expressionNodes.id = operandExpressions.expression_node_id
ORDER BY
functions.address, basicBlocks.address,
instructions.sequence, operands.position,
expressionNodes.parent_id,
expressionNodes.position;
[/sourcecode]

Limitations and shortcomings

The only architectures supported are x86 (IDA’s metapc), ARM and PPC. The design is pretty modular and supports adding new architectures by simply adding a new script. The best way to go about it would be to take a look at one of the existing scripts (PPC and ARM being the simplest and most manageable) and modify them as needed.

ida2sql has been designed with the goal in mind of providing an information storage for our products, such as BinNavi. It will only export code that is contained within functions. If you have an IDB that has not been properly cleaned or analyzed and contains snippets/chunks of code not related to functions, those will not be exported. Examples of some cases would be exception handlers that might only be referenced through a data reference (if at all) or switch-case statements that haven’t been fully resolved by IDA.

The scripts have exported IDBs with hundreds of thousands of instructions and many thousands of functions. Nonetheless the larger the IDB the more memory the export process is going to require. ida2sql’s performance scales mostly linearly when exporting. It should not degrade drastically for larger files. It will also make use of temporary files that can grow large (few hundred MBs if the IDB is tens of MBs in size when compressed). Those should not be major limitations for most uses of ida2sql.

Also it’s worth noting that IDA 5.7 has introduced changes to the core of IDAPython and the tests we have made so far with the current beta the performance of ida2sql has improved significantly. In the following figures you can see the export times in seconds for some IDBs exported with IDA 5.5, 5.6 and 5.7.

ida2sql export times

ida2sql export times for a medium size file

ida2sql export times

ida2sql export times for a set of small IDBs

Summing up. We hope this tool will come handy for anyone looking into automating mass analysis and has been bitten by the opaque and cumbersome IDBs. Give it a spin, look at the source code, break it and don’t forget to let us know how it could be improved! (patches are welcome! 😉 )