Author Archive

ReCon slides – “Packer Genetics: The Selfish Code” & Bochs+Python

2010-07-16

A few days ago Jose and Ero presented in ReCon some of the latest ideas they have been working on regarding unpacking. We have put our slides up for your viewing pleasure here:

Our slides are also available for download here. Beware that they are merely a visual aid to our live presentation. We will try to remember to announce when the ReCon video comes out so you can follow them there.

In addition, Jose will be presenting on the topic in SysCan Taipei on August 20th. That will be another good chance to catch the info fresh and live.

Bochs and Python

Bochs and our custom Python extensions were one of the fundamental tools onto which we built our research.

Ero has been keeping the Python extensions up to date for a few years and they are something we use a lot at zynamics. We have attempted to make them public in a few occasions (an old patch is available in the Bochs mailing list) but those attempts failed to make them known to more users. We are frequently reminded at conferences that people would love to play with them, so this time we are making them available through a zynamics GitHub project. The plan is to keep them in sync with all major releases of Bochs. In the GitHub page you can find basic instructions on how to get them working. The patch to apply to the current public version of Bochs (2.4.5 at this time) can be found here

We will add usage examples to the GitHub wiki as time allows. Also if there are special requests we will try to provide exemples on how to use the extensions for those cases. Download them, play with them and let us know your thoughts.

ida2sql: exporting IDA databases to MySQL

2010-06-29

Today we are finally making it easier to get your hands on ida2sql, our set of scripts to export information contained in an IDA database into MySQL.

As a short recap, ida2sql is a set of IDAPython scripts to export most of the information contained in an IDB into a MySQL database. It has existed and evolved already for a few years and has been the main connection between IDA and BinNavi for the most of the life of the latter.

The last development efforts have been geared towards making the schema a bit more friendly (see below) and making it work in a fair range of IDA (5.4 to 5.7) and IDAPython versions (including some that shipped with IDA which had minor problems, ida2sql will automatically work around those issues). The script runs under Windows, Linux and OSX.

ida2sql is comprised of a ZIP archive containing the bulk of the scripts that simply needs to be copied to IDA’s plugin folder. No need to extract its contents. A second script, ida2sql.py, needs to be run from within IDA when we are ready to export data. You can keep it in any folder, it should be able to automatically locate the ZIP file within the plugins folder. You can download here a ready built package containing the ZIP file, the main script, a README and an example configuration file.

When the main Python file is executed in IDA and if all the dependencies are successfully imported the user will be presented with a set of dialogs to enter the database information. If the database is empty there will also be a message informing that the basic set of tables is about to be created at that point. Once all configuration steps have been completed the script will start processing the database, gathering data and finally inserting it all into the database.

The configuration process can be simplified by creating a config file ida2sql.cfg in IDA’s main directory (or by pointing to it the IDA2SQLCFG environment variable). If ida2sql can find that file it will not ask for any of the configuration options and go straight into the exporting.

Automation

ida2sql has a batch mode that comes handy when you need to export a collection of IDBs into the database. To run ida2sql in batch mode it’s enough to set the corresponding option in the configuration file.

mode: batch

An operation mode of “batch” or “auto” indicates that no questions or other kind of interaction should be requested from the user. (Beware though that IDA might still show dialogs like those reminding of a license or free-updates period about to expire. In those cases run IDA through the GUI and select to never show again those reminders). The batch mode is specially useful when running ida2sql from the command line, for instance:

idag.exe -A -OIDAPython:ida2sql.py database.idb|filename.exe

Requirements

  • mysql-python
  • A relatively recent IDA (tested with 5.4, 5.5, 5.6 and the latest beta of 5.7)
  • IDAPython. Chances are that you already have it if you are running a recent IDA version
  • A MySQL database. It does not need to reside on the same host

The schema

A frequent criticism to the schema design has always been the use of a set of tables per each module. People have asked why not use instead using a common table-set for all modules in the database. While we considered this approach in the original design, we opted for using a set of tables per module. We are storing operand trees in an optimized way aiming at reducing redundant information by keeping a single copy of all the common components of the operand’s expression tree. Such feature would be extremely difficult to support were we to use a different a different schema. Additionally tables can easily grow to many tens of millions of rows for large modules. Exporting hundreds of large modules could lead to real performance problems.

The table “modules” keeps track of all IDBs that have been exported into the database and a set of all the other tables exists for each module.

BinNavi DB Version 2

BinNavi DB Version 2

The following, rather massive, SQL statement shows how to retrieve a basic instruction dump for all exported code from an IDB. (beware of the placeholder “_?_”)

SELECT
   HEX( functions.address ) AS functionAddress,
   HEX( basicBlocks.address ) AS basicBlockAddress,
   HEX( instructions.address ) AS instructionAddress,
   mnemonic, operands.position,
   expressionNodes.id, parent_id, 
   expressionNodes.position, symbol, HEX( immediate )
FROM 
    ex_?_functions AS functions
INNER JOIN ex_?_basic_blocks AS basicBlocks ON
    basicBlocks.parent_function = functions.address
INNER JOIN ex_?_instructions AS instructions ON 
    basicBlocks.id = instructions.basic_block_id
INNER JOIN ex_?_operands AS operands ON 
    operands.address = instructions.address
INNER JOIN ex_?_expression_tree_nodes AS operandExpressions ON
    operandExpressions.expression_tree_id = operands.expression_tree_id
INNER JOIN ex_?_expression_nodes AS expressionNodes ON
    expressionNodes.id = operandExpressions.expression_node_id
ORDER BY 
    functions.address, basicBlocks.address,
    instructions.sequence, operands.position, 
    expressionNodes.parent_id,
    expressionNodes.position;

Limitations and shortcomings

The only architectures supported are x86 (IDA’s metapc), ARM and PPC. The design is pretty modular and supports adding new architectures by simply adding a new script. The best way to go about it would be to take a look at one of the existing scripts (PPC and ARM being the simplest and most manageable) and modify them as needed.

ida2sql has been designed with the goal in mind of providing an information storage for our products, such as BinNavi. It will only export code that is contained within functions. If you have an IDB that has not been properly cleaned or analyzed and contains snippets/chunks of code not related to functions, those will not be exported. Examples of some cases would be exception handlers that might only be referenced through a data reference (if at all) or switch-case statements that haven’t been fully resolved by IDA.

The scripts have exported IDBs with hundreds of thousands of instructions and many thousands of functions. Nonetheless the larger the IDB the more memory the export process is going to require. ida2sql’s performance scales mostly linearly when exporting. It should not degrade drastically for larger files. It will also make use of temporary files that can grow large (few hundred MBs if the IDB is tens of MBs in size when compressed). Those should not be major limitations for most uses of ida2sql.

Also it’s worth noting that IDA 5.7 has introduced changes to the core of IDAPython and the tests we have made so far with the current beta the performance of ida2sql has improved significantly. In the following figures you can see the export times in seconds for some IDBs exported with IDA 5.5, 5.6 and 5.7.

ida2sql export times

ida2sql export times for a medium size file

ida2sql export times

ida2sql export times for a set of small IDBs

Summing up. We hope this tool will come handy for anyone looking into automating mass analysis and has been bitten by the opaque and cumbersome IDBs. Give it a spin, look at the source code, break it and don’t forget to let us know how it could be improved! (patches are welcome! 😉 )

Exploring malware relations

2010-04-13

Peter Silberman and I will be presenting, this Wednesday at BlackHat Europe 2010, our latest peek into the malware world.

In this talk we follow up from our previous “State of Malware: Explosion of the Axis of Evil” , where we explored some of the idiosyncrasies of mass-malware vs. the ones in targeted malware. This time around we decided to try to spot actual code-sharing among a reasonably diverse corpus of recent malware obtained from VirusTotal.

Using some of the technology we’ve developed in zynamics to automatically classify malware we attempted to find relations between malware samples that would indicate that code had been shared, possibly indicating collaborative efforts of malware developers or even giving clues about common authorship.

If you want to hear what it is that we found out, don’t hesitate in dropping by our talk Wednesday 16:45h

See you there!