Dumping shellcode with Pin

About six weeks ago, when I blogged about the Adobe Reader/Flash 0-day that was making the rounds back then, I talked about generating automated shellcode dumps with Pin. In this post I want to talk a bit about Pin, dynamic binary instrumentation, and the shellcode dumper Pintool we developed at zynamics.

Dynamic binary instrumentation is a technique for analyzing binary files by executing the files and injecting analysis code into the binary file at runtime. This method is not exactly new. It has been in use for many years already, for example in program verification, profiling, and compiler optimization. However, despite its amazing power and ease of use, dynamic binary instrumentation is still not widely used by binary code reverse engineers.

The two most important dynamic binary instrumentation tools for binary code reverse engineers are Pin and DynamoRIO. Pin is developed by Intel and provided by the University of Virginia while DynamoRIO is a collaboration between Hewlett-Packard and MIT. Both are free to use but only DynamoRIO is open source.

In general, both Pin and DynamoRIO are very similar to use. If you want to use either tool you have to write a C or C++ plugin that contains your analysis code. This code is then injected into the target process by Pin or DynamoRIO. For most reverse engineering purposes, Pin and DynamoRIO are both equally useful and when you talk to reverse engineers who make use of dynamic binary instrumentation it often seems to be a matter of personal taste which tool they prefer. At zynamics we use Pin because the API for analysis code seems cleaner to us.

Let’s get back to shellcode dumping now. The idea behind shellcode detection is rather simple: Whenever an instruction is executed, check if that instruction belongs to a section of a loaded module in the address space of the target process. If that’s the case, then the instruction is considered legit (not shellcode). If, however, the instruction is outside of any module section, and therefore most likely on the stack or allocated heap memory, the instruction is considered shellcode. This heuristic is not perfect, of course. However, it works surprisingly well in practice.

In fact, the Pintool we developed does exactly this. For every executed instruction in the target process it performs the check described in the above paragraph. Until shellcode is found, the Pintool keeps track of up to 100 legit instructions executed before the shellcode. Then, when shellcode is found, it dumps the legit instructions before the shellcode and the shellcode itself. The big value of our Pintool is not that it dumps the shellcode. The big value is that it tells you exactly where control flow is transferred from the legit code to the shellcode. With this information you can quickly find the vulnerability in the exploited program. If you are really interested in the shellcode, you can also just set a breakpoint on the last legit instruction before the shellcode and do a manual analysis from there.

You can find the complete documented source of our shellcode dumper Pintool on the zynamics GitHub. The source code is surprisingly short and I did my best to document the code. If you are having any questions about the source code, please leave a comment to this blog entry or contact me in some other way.

Let’s take a look at the output of a sample run now. You can find the full trace log on GitHub but here are the important parts.

What you always want to look for is the string “Executed before”. These are the legit instructions before control flow is transferred to the shellcode. Ignore the first occurence of this string. I have not had a deeper look what code is recognized as shellcode there, but it might be JIT-compiled Adobe Reader JavaScript code (which also does not belong to any section of any module and is therefore detected as shellcode). The second occurence of the string is what you want to look at.

[sourcecode]Executed before
0x238C038E::EScript.api  E8 D2 72 F6 FF  call 0x23827665
0x238BBF4F::EScript.api  8B 44 24 04     mov eax, dword ptr [esp+0x4]
0x238BBF53::EScript.api  C6 40 FF 01     mov byte ptr [eax-0x1], 0x1

[ … more EScript.api instructions … ]

0x2D841E82::Multimedia.api  56           push esi
0x2D841E83::Multimedia.api  8B 74 24 08  mov esi, dword ptr [esp+0x8]
0x2D841E87::Multimedia.api  85 F6        test esi, esi
0x2D841E89::Multimedia.api  74 22        jz 0x2d841ead
0x2D841E8B::Multimedia.api  56           push esi

[ … more Multimedia.api instructions … ]

0x2D841E96::Multimedia.api  8B 10        mov edx, dword ptr [eax]
0x2D841E98::Multimedia.api  8B C8        mov ecx, eax
0x2D841E9A::Multimedia.api  FF 52 04     call dword ptr [edx+0x4]

Shellcode:
0x0A0A0A0A::  0A 0A                      or cl, byte ptr [edx]
0x0A0A0A0C::  0A 0A                      or cl, byte ptr [edx]
0x0A0A0A0E::  0A 0A                      or cl, byte ptr [edx]

[ … more shellcode … ][/sourcecode]

The log clearly shows that control flow is transferred from EScript.api (the Adobe Reader JavaScript engine) to Multimedia.api (the Adobe Reader multimedia library) and then to shellcode which does not lie in any module section. You can even see that the exploit code controls the memory address of [edx + 0x4] in the last executed instruction of Multimedia.api. With this knowledge you can work back from this point to see what exactly the vulnerability is that was exploited. In the example, the vulnerability was the use-after-free media.newPlayer vulnerability of older Adobe Reader versions. This vulnerability uses JavaScript code to trigger a bug in the Multimedia API.

In case you are wondering about gaps in the instruction trace (for example at calls), please note that each instruction is only dumped once. So, if a function is called twice, the second call is not dumped to the output file anymore. This behavior was added to keep log files small.

I think the shellcode dumper is a good example for a first Pintool. The idea behind it is really simple and the actual Pintool can be improved by interested readers many ways (dump register values or improve the shellcode detection heuristic, for example). If you are making improvements to the tool, please let us know.

5 Responses to “Dumping shellcode with Pin”

  1. n says:

    tracing only BBs could be good optimization, for the flow analysis use the BB flow in IDA

    • Sebastian Porst says:

      Yep, you’re right. Pin gives you the option between different granularity settings when tracing. I am not totally sure I remember that right but I believe the reason for why we implemented instruction-level tracing was because at one point we modified the script to dump the target locations of dynamic function calls.

  2. Sebastian says:

    Neat! But will most exploit not use ROP-like
    techniques, so you cant easily detect “shellcode”
    as its of course inside the segments like any
    other code?
    Is there good heuristics that tracks the stack to
    detect whether someone is ROPing?
    Keep on the good work.

    • Sebastian Porst says:

      Hi Sebastian,

      good point about ROP. I thought like you until I came across my first ROP shellcode. Turns out that the Pintool still works in practice. Here’s why:

      In exploits, often only the first stage of shellcode is a ROP stage. The purpose of the ROP stage is to set up later stages which are then regular shellcode. Then the second (regular) shellcode stage is executed.

      If you come across such an exploit, the Pintool will still detect the second stage of the shellcode. The first stage will still be dumped by the shellcode though. You can find it in the “Executed Before” part instead of the “Shellcode” part of the log file because the Pintool thinks the first stage is legit code. Look in the linked post of the Flash/PDF 0-day for an example.

      If your shellcode is 100% ROP the Pintool will not pick it up though.

      I know that there are people that experiment with stack heuristics for detecting ROP shellcode but I have not heard of anyone having something ready to show to the public though.

      • caf says:

        Isn’t detecting ROP a matter of waiting until a RET, then checking that the return address actually points at an instruction that follows a CALL?