Archive for July, 2010

Win32 Kernel Debugging with BinNavi

2010-07-29

Hi everyone,

we – that would be Andy and Felix – are student interns at zynamics in Bochum. We both study IT-Security at the University of Bochum in our 8th semester and have both been with the company for several years now.

For the last half year we have been working together on a WinDbg kernel-debugging interface for zynamics’ reverse engineering tool BinNavi. After our latest bug fixes and code improvements we now feel ready to announce that this piece of software has finally reached alpha status. It is now almost feature complete but still got some rough edges and known bugs.

What does this mean to you as a (maybe future) BinNavi customer?

You will be able to use all the advanced debugging features of BinNavi for remote Win32 driver and kernel debugging. All you need to have is a machine with BinNavi and the Microsoft Debugging Tools for Windows installed and – of course – a second Win32 (virtual) machine you want to debug. Given these prerequisites, you can directly start to explore the vast and dark realms of Win32 kernel land from an easy-to-use, nice and cozy GUI. There are probably other tools out there to do this. But only BinNavi provides you with all the powerful features of our Differential Debugging technology. Please see Sebastian’s post from a few weeks ago to understand why we are so excited to bring this technology to ring0.

To give you an idea of how kernel debugging with BinNavi looks like, we took three screenshots. The first one shows the driver selection dialog that BinNavi displays right after attaching to a target machine. The second one displays a function trace of mrxsmb.sys on an idle Windows XP machine connected to a network. The 150 functions called during our trace are enumerated in the lower mid, while the recorded register and memory values for each call are displayed in the lower right. In the third screenshot you can see us single-stepping a random function in mrxsmb.sys.

Selection of the target driver

Function trace of mrxsmb.sys on idle machine

Single-stepping mrxsmb.sys

Once we are done polishing our code, we will post here again on this site to demonstrate how this technology can facilitate the process of finding the interesting code parts in Win32 drivers. Specifically, we will use Differential Debugging to pin point the code parts that are responsible for password processing inside the driver of a certain closed-source HDD encryption product. This is interesting both for writing a password brute-forcer and for checking for implementation mistakes.

If you are an existing BinNavi costumer and want to play a bit with the current alpha version, just let us know – we will be happy to supply you with the latest build. Beside that, the final version will be shipped to all customers with one of the next BinNavi updates.

Dumping shellcode with Pin

2010-07-28

About six weeks ago, when I blogged about the Adobe Reader/Flash 0-day that was making the rounds back then, I talked about generating automated shellcode dumps with Pin. In this post I want to talk a bit about Pin, dynamic binary instrumentation, and the shellcode dumper Pintool we developed at zynamics.

Dynamic binary instrumentation is a technique for analyzing binary files by executing the files and injecting analysis code into the binary file at runtime. This method is not exactly new. It has been in use for many years already, for example in program verification, profiling, and compiler optimization. However, despite its amazing power and ease of use, dynamic binary instrumentation is still not widely used by binary code reverse engineers.

The two most important dynamic binary instrumentation tools for binary code reverse engineers are Pin and DynamoRIO. Pin is developed by Intel and provided by the University of Virginia while DynamoRIO is a collaboration between Hewlett-Packard and MIT. Both are free to use but only DynamoRIO is open source.

In general, both Pin and DynamoRIO are very similar to use. If you want to use either tool you have to write a C or C++ plugin that contains your analysis code. This code is then injected into the target process by Pin or DynamoRIO. For most reverse engineering purposes, Pin and DynamoRIO are both equally useful and when you talk to reverse engineers who make use of dynamic binary instrumentation it often seems to be a matter of personal taste which tool they prefer. At zynamics we use Pin because the API for analysis code seems cleaner to us.

Let’s get back to shellcode dumping now. The idea behind shellcode detection is rather simple: Whenever an instruction is executed, check if that instruction belongs to a section of a loaded module in the address space of the target process. If that’s the case, then the instruction is considered legit (not shellcode). If, however, the instruction is outside of any module section, and therefore most likely on the stack or allocated heap memory, the instruction is considered shellcode. This heuristic is not perfect, of course. However, it works surprisingly well in practice.

In fact, the Pintool we developed does exactly this. For every executed instruction in the target process it performs the check described in the above paragraph. Until shellcode is found, the Pintool keeps track of up to 100 legit instructions executed before the shellcode. Then, when shellcode is found, it dumps the legit instructions before the shellcode and the shellcode itself. The big value of our Pintool is not that it dumps the shellcode. The big value is that it tells you exactly where control flow is transferred from the legit code to the shellcode. With this information you can quickly find the vulnerability in the exploited program. If you are really interested in the shellcode, you can also just set a breakpoint on the last legit instruction before the shellcode and do a manual analysis from there.

You can find the complete documented source of our shellcode dumper Pintool on the zynamics GitHub. The source code is surprisingly short and I did my best to document the code. If you are having any questions about the source code, please leave a comment to this blog entry or contact me in some other way.

Let’s take a look at the output of a sample run now. You can find the full trace log on GitHub but here are the important parts.

What you always want to look for is the string “Executed before”. These are the legit instructions before control flow is transferred to the shellcode. Ignore the first occurence of this string. I have not had a deeper look what code is recognized as shellcode there, but it might be JIT-compiled Adobe Reader JavaScript code (which also does not belong to any section of any module and is therefore detected as shellcode). The second occurence of the string is what you want to look at.

Executed before
0x238C038E::EScript.api  E8 D2 72 F6 FF  call 0x23827665
0x238BBF4F::EScript.api  8B 44 24 04     mov eax, dword ptr [esp+0x4]
0x238BBF53::EScript.api  C6 40 FF 01     mov byte ptr [eax-0x1], 0x1

[ ... more EScript.api instructions ... ]

0x2D841E82::Multimedia.api  56           push esi
0x2D841E83::Multimedia.api  8B 74 24 08  mov esi, dword ptr [esp+0x8]
0x2D841E87::Multimedia.api  85 F6        test esi, esi
0x2D841E89::Multimedia.api  74 22        jz 0x2d841ead
0x2D841E8B::Multimedia.api  56           push esi

[ ... more Multimedia.api instructions ... ]

0x2D841E96::Multimedia.api  8B 10        mov edx, dword ptr [eax]
0x2D841E98::Multimedia.api  8B C8        mov ecx, eax
0x2D841E9A::Multimedia.api  FF 52 04     call dword ptr [edx+0x4]

Shellcode:
0x0A0A0A0A::  0A 0A                      or cl, byte ptr [edx]
0x0A0A0A0C::  0A 0A                      or cl, byte ptr [edx]
0x0A0A0A0E::  0A 0A                      or cl, byte ptr [edx]

[ ... more shellcode ... ]

The log clearly shows that control flow is transferred from EScript.api (the Adobe Reader JavaScript engine) to Multimedia.api (the Adobe Reader multimedia library) and then to shellcode which does not lie in any module section. You can even see that the exploit code controls the memory address of [edx + 0x4] in the last executed instruction of Multimedia.api. With this knowledge you can work back from this point to see what exactly the vulnerability is that was exploited. In the example, the vulnerability was the use-after-free media.newPlayer vulnerability of older Adobe Reader versions. This vulnerability uses JavaScript code to trigger a bug in the Multimedia API.

In case you are wondering about gaps in the instruction trace (for example at calls), please note that each instruction is only dumped once. So, if a function is called twice, the second call is not dumped to the output file anymore. This behavior was added to keep log files small.

I think the shellcode dumper is a good example for a first Pintool. The idea behind it is really simple and the actual Pintool can be improved by interested readers many ways (dump register values or improve the shellcode detection heuristic, for example). If you are making improvements to the tool, please let us know.

PDF Dissector 1.4.0 released

2010-07-22

PDF Dissector 1.4.0 (Product site / Manual) fixes a few PDF parser bugs, improves the Adobe Reader emulation, and adds a cool new feature you can use for searching through all open PDF files.

Here is the detailed list of changes:

  • Feature: Added a way to search through the content of all open files.
  • Feature: Annotation names are now correctly emulated.
  • Bugfix: Fixed a parser bug that led to missed data streams if there was a comment between an object and its stream.
  • Bugfix: Fixed a parser bug that led to crashes when parsing invalid octal numbers.
  • Bugfix: All tabs belonging to a file are now closed when closing the file.

In PDF Dissector 1.4.0 you will now find a text field above the PDF browser where you can enter text strings to search for. The search function searches through dictionary keys, dictionary values, strings, data streams and other elements of PDF files and displays only those elements that match the search string. This is very useful if you want to answer questions like ‘which PDF files of my collection use JavaScript’?

The screenshot below shows how I used the filter to search through about 50 PDF files for exactly those that use the OpenAction command to execute some code when the PDF file is opened.

The new filter function in PDF Dissector 1.4.0

The REIL language – Part III

2010-07-19

In the first and second part of this series I have given an overview of our Reverse Engineering Intermediate Language (REIL) and talked about the purpose and structure of the individual REIL instructions. This third part is about REIL code generation.

REIL is included in BinNavi to help users write their own code analysis algorithms, often based on abstract interpretation. However, obviously our users are not really interested in analyzing REIL code. What they really want is to analyze the native assembly code of their target binary file. The REIL language can nevertheless be used by all users of BinNavi. The translation from native assembly code to REIL code is just as simple and transparent as porting the results of REIL analysis back to the original code.

BinNavi 3.0 ships with REIL translators for ARM, MIPS, PowerPC, and x86 code. Code from any of these native assembly languages can be translated to REIL code with the same effect on the program state as the original code. Users can then analyze the effects of the much simpler REIL code and port the results of the analysis back to the original native assembly code they are really interested in.

The translation process from native assembly code to REIL code is very simple. Given a list of native assembly instructions, each individual instruction is translated to REIL code. In a second pass, the generated linear listing of REIL code is taken and converted into a control flow graph. This control flow graph is then passed to the user and he can use it in his code analysis algorithm.

When writing the translators, we made sure that all native assembly instructions could be translated to REIL code without needing any information about the instructions before or after the current instruction. This allowed us to keep the individual instruction translators stateless.

Generally multiple REIL instructions are emitted by the translators for any native input instruction. Consequently it is not possible to keep a 1:1 mapping between the addresses of the original assembly instructions and the addresses of their corresponding REIL instructions. The solution we found was to multiply every native address by 0x100 to calculate the base address of all REIL instructions that belong to a native instruction. The lowest byte of such REIL addresses can then be filled by an index that specifies the relative position of a single REIL instruction inside the sequence of REIL instructions generated from the same native input instruction. What I mean here is that if you have a native instruction at address 0x08 the REIL translators will generate REIL instructions for it at addresses 0x800, 0x801, 0x802, …

This way of translating native instruction addresses to REIL instruction addresses limits us to at most 256 REIL instructions for each native instruction. In practice, even getting to 30 REIL instructions for one native instruction is rare although we have a few outliers that are translated to up to 70 REIL instructions.

There is another big advantage to this address translation method. For any given REIL instruction you can immediately determine its native source instruction. There is no need to consider any additional context around the REIL instruction. This is really important if you want to port the results of a REIL analysis algorithm back to the original code. If you have determined a result for a REIL instruction you can just divide its REIL address to 0x100 and you know the original instruction for which the result holds too.

Translating x86 code to REIL code

That’s it for now. If you have any questions about the translation process or REIL in general please leave a comment.

ReCon slides – “Packer Genetics: The Selfish Code” & Bochs+Python

2010-07-16

A few days ago Jose and Ero presented in ReCon some of the latest ideas they have been working on regarding unpacking. We have put our slides up for your viewing pleasure here:

Our slides are also available for download here. Beware that they are merely a visual aid to our live presentation. We will try to remember to announce when the ReCon video comes out so you can follow them there.

In addition, Jose will be presenting on the topic in SysCan Taipei on August 20th. That will be another good chance to catch the info fresh and live.

Bochs and Python

Bochs and our custom Python extensions were one of the fundamental tools onto which we built our research.

Ero has been keeping the Python extensions up to date for a few years and they are something we use a lot at zynamics. We have attempted to make them public in a few occasions (an old patch is available in the Bochs mailing list) but those attempts failed to make them known to more users. We are frequently reminded at conferences that people would love to play with them, so this time we are making them available through a zynamics GitHub project. The plan is to keep them in sync with all major releases of Bochs. In the GitHub page you can find basic instructions on how to get them working. The patch to apply to the current public version of Bochs (2.4.5 at this time) can be found here

We will add usage examples to the GitHub wiki as time allows. Also if there are special requests we will try to provide exemples on how to use the extensions for those cases. Download them, play with them and let us know your thoughts.

We are hiring a new BinNavi developer

2010-07-15

After we have already hired Tim and Jose this year to join zynamics as full-time employees, we are now looking to extend our team once again. This time we are looking for a software developer who wants to join the BinNavi team.

BinNavi is a binary code reverse engineering tool that enables reverse engineers to analyze binary code. Customers of BinNavi are primarily vulnerability researchers from companies and governmental organizations of many different nations that try to find new 0-days in closed-source software.

Working on BinNavi means you will be working on a large application with more than 500.000 lines of code. The majority of that code is written in Java (the whole main program), a few ten-thousand lines of code are written in C++ (the debuggers and the IDA Pro plugin to get disassembly data from IDA Pro into a MySQL database). You are a good candidate if you know how to write clean code and you have a reverse engineering background that gives you an idea about what features are useful to our customers.

It is also crucial that you are self-motivated and have a clear idea of where development should be going. At zynamics, there is very little management from above. Rather, the individual teams (like the BinNavi team) decide themselves what features to prioritize next and when to schedule the next release of a project.

There are a few perks that make working for zynamics really pay off. There is the obvious one: you will work on the cutting edge of reverse engineering tool research and development. However, there are others. For example, you can attend as many IT security conferences as you want to provided you give a talk there and the organizers pay for your flight and hotel (which nearly all IT security conferences do, so just submit somewhere and be accepted to speak). There is a nearly unlimited budget for computer science books (all employees know the password to the corporate Amazon account and can buy at will). You will meet many of our customers who come from all walks of life and have amazing stories to tell.

Of course there are downsides to the job, too. The primary issue we face again and again when filling job positions is that we do not want any remote workers. You would have to move to Bochum, Germany for the job and work from our office (working from home two times a week or so is OK). Since we want to fill this position quickly (preferably you would start August 1st but no later than August 15th) we can not consider candidates that require a work permit that takes longer to process. Except for this, we welcome applications from software developers of all backgrounds.

Please note that this is not a reverse engineering job. On the job you will most likely not be doing a lot of reverse engineering beyond what is required to test BinNavi. What you can do, however, is to implement new code analysis algorithms that improve the usefulness of BinNavi to our customers.

If you are interested in this job, please send an email to info@zynamics.com to request more information. Or just send your resume and some piece of code you wrote that makes us want to hire you.

Las Vegas & the zynamics team

2010-07-14

Along with RECon, the single most important date in the reverse engineering / security research community is the annual Blackhat/DefCon event in Las Vegas. Most of our industry is there in one form or the other, and aside from the conference talks, parties and award ceremonies, there’s also a good amount of technical discussions (in bars or elsewhere) that takes place.

This year, a good number of researchers/developers from the zynamics Team will be present in Las Vegas — alphabetically, the list is:

  1. Ero Carrera
  2. Thomas Dullien/Halvar Flake
  3. Vincenzo Iozzo
  4. Tim Kornau

So, if you wish meet any of the team to discuss reverse engineering, our technologies, our research, or the performance of the Spanish or German football team at the last world cup, do not hesitate to drop an email to info@zynamics.com — Vegas is always chaotic, and scheduling a meeting will minimize stress for everyone that is involved.

Specifically, the following topics are specifically worth meeting over:

  1. Chat with Ero over our unpacking engine (just presented at RECon) — and how it fits into the larger scheme of things (e.g. VxClass)
  2. Meet with Tim or Vincenzo to discuss automated gadget-finding for ROP, or anything involving the ARM/REIL translations
  3. Meet with Thomas/Halvar to discuss VxClass, automated malware clustering, automated generation of “smart” malware signatures etc.

Aside from this, if you are interested in …

  • … boosting your reverse engineering performance by porting symbols from FOSS software into your closed-source disassemblies (BinDiff)
  • … becoming faster at finding bugs by leveraging differential debugging, the REIL intermediate language and static analysis frameworks (BinNavi)
  • … enhancing team-based reverse engineering by pooling accumulated knowledge and sharing information (BinCrowd)
  • … automatically correlating and clustering malware and forensically obtained memory dumps, and automatically deriving detection mechanisms (VxClass)
  • … analyzing malicious PDF files including the embedded JavaScript code (PDF Dissector)

then do not hesitate to drop us mail — we’ll gladly show/explain what our tools/technologies can do.

See you there !

ReCon slides – How to really obfuscate your PDF malware

2010-07-13

Last Friday I was at ReCon in Montreal to give a talk about obfuscated PDF malware. I got the idea for the talk during my work on PDF Dissector where I saw a lot of obfuscated PDF malware. The obfuscation I saw in the wild was mostly very limited and the malware authors did not seem to think things through to the very end. I took the opportunity to think a bit further about the whole topic of PDF malware obfuscation and a few of the result of these thoughts can be seen in the slides below. If you do not have Flash enabled, click here to download the slides.

BinNavi 3.0 Beta 2 released

2010-07-07

Today we have released the second beta of BinNavi 3.0. We are now planning to release the final version on August 1st.

The main thing we changed since the first beta version was to improve the MySQL database format to handle large files better. This became necessary as more and more of our customers try to analyze really big images, like Cisco router dumps, with BinNavi. The second major change was to add compatibility with the new IDA Pro 5.7 which is now the preferred data source for BinNavi disassembly data.  Of course we have also fixed various minor bugs that have been reported by our busy beta testers since the first beta was released.

Not many features were added since the first beta was released. You can see the most important new features of BinNavi 3 in this blog post I wrote when the first beta was released. To learn more about BinNavi please check out the manual on our website.

Screenshot of BinNavi 3.0: Highlighting all uses of the local variable Buffer

If you are already a customer of a zynamics product and you would like to get your hands on the BinNavi 3.0 beta, please send an email to support@zynamics.com.

PDF Dissector 1.3.0 released

2010-07-04

The 1.3.0 release of our PDF malware analysis tool PDF Dissector is primarily a bugfix release to undo some of the bugs introduced in 1.2.0. However, I have also added a cool new feature.

I have added a way to quickly browse through the content of all decoded data streams. This is very useful if you want to quickly see what data streams contain potentially malicious content like embedded Flash files or AcroForms code. To account for binary resources and text resources you can switch between text mode and hexadecimal mode.

The screenshow below shows what the new feature looks like.  You can clearly see the embedded Flash file on object 12 (note the Flash file header starting with FWS).

To learn more about PDF Dissector please check out the manual.