Author Archive

Guest lecture: Architectural Diversity

2010-05-21

At zynamics we believe that good education is something we have to support. Therefore Sebastian and I decided to support Professor Felix Freiling and his two assistants Carsten Willems and Ralf Hund in their class called Software Reverse Engineering at the University of Mannheim, Germany. Sebastian held a lecture about Windows debugger internals and their use in reverse engineering which you can read about here. This week it was my turn to share some knowledge about architectural diversity in reverse engineering.

While architecture diversity is nothing new, still most people think that only x86 and x64 are interesting to look at because of their desktop computer market share. In my lecture I wanted to show that the range of interesting targets is far broader than generally believed. I started the lecture with a cherry picked set of architectures which are quite common in different usage scenarios. These architectures have some interesting differences between them to motivate a need for a more general reverse engineering approach. Even though a variety of general reverse engineering approaches exist I focused on our own approach, the REIL meta language. I gave a short introduction to the features of REIL and a language definition with an emphasis on its simplicity. After presenting small translation examples which show how REIL translation works I started with REIL use case examples. Prior to presenting and demoing register tracking as a simple use case, a very informal description about the underlying MonoREIL framework was presented. MonoREIL is an abstract interpretation framework which ships with BinNavi to assist an analyst in writing algorithms to answer questions about program states using a formally described method. Demoing register tracking and explaining how it works on top of MonoREIL rounded up the lecture.

I was asked to hold an exercise covering all topics of the lecture after presenting which worked out pretty well. I enjoyed being invited to give a lecture in Mannheim and I greatly admire the work which has been put into the lecture in general. If more Universities offered a reverse engineering class it would be a great plus for a lot of students.

The slides which I used for lecturing are in German and available here:

Algorithms for platform independent return-oriented programming (I of III)

2010-04-16

In my last post about the history of return-oriented programming I showed that we are not dealing with a completely new technology when we are talking about return-oriented programming. However, the technology is evolving to a point where even the world of academia thinks it worth discussing it in theoretical conferences. Until recently return-oriented programming has always been platform dependent so that one specific implementation was only able to work on one single platform. To sharpen the point a little further current approaches only target one specific compiler for one platform in general. Even though this is not necessarily the case for variable length instruction sets like the IA-32/64 instruction set, where the search for instruction sequences can be performed without paying attention to the alignment restrictions, for all platforms where alignment is enforced the current approaches are still very limited.

In this post I will start showing a set of algorithms which can be used platform independently to locate suitable instruction sequences for return-oriented programming.

So let’s get started with defining return-oriented programming to get an idea where we are eventually trying to end up with.

The goal is to build a program from existing code chunks of another program or commonly used libraries. A program built from the parts of another binary is called a return-oriented program. The individual parts that form a return-oriented program are named gadgets. A gadget is a sequence of instructions in the target binary that provides a usable operation, such as addition of two registers. To be able to build a program from gadgets, they must be combinable. Gadgets are combinable if they end in an instruction that changes control flow in a way that can be controlled by the attacker. Instructions at the end of gadgets are named free-branch instructions. A free-branch instruction must satisfy the following properties:

  • The control flow must change at this instruction.
  • The target of the control flow must be controllable (free) such that the input from an attacker controled register or memory offset defines the target.

After this small definition of what we are actually looking for we can now move on and discuss the algorithms which we will use to help us find instruction sequences which satisfy the above definition.

As we need a starting point for our search of useful instruction sequences, the first step is to locate all free-branch instructions in the targeted binary (In BinNavi this is a single SQL query). After we have collected all of the free-branch instructions we start with phase one of our algorithms, data collection within the binary. The goal of the data collection phase is to provide us with the following information:

  • What possible paths are usable for gadgets and end in a free-branch instruction.
  • What does the REIL representation of the instructions on the possible paths look like.

If you are not familiar with REILs ([R]everse [E]ngineering [I]ntermediate [L]anguage) concepts be sure to check our article we have posted a while back to get you started.

Now that we have a goal definition for our first phase algorithms we can start looking into the algorithms to get the desired data. To find all paths which could provide useful instruction sequences for gadgets we use an algorithm which traverses backwards through the control flow graph of all the functions where we found free-branch instructions. Using the free-branch instruction as our starting point we walk upwards instruction by instruction until no more instructions are contained in the initial basic block. The following graphic shows examples for ARM and MIPS free-branch instructions.

Examples for free-branch instructions (more…)

A gentle introduction to return-oriented programming

2010-03-12

Hi,

As I have promised in my last post I will start a series about return-oriented programming. I start with a short introduction about the topic. The introduction covers the origin of return-oriented programming, describes what return-oriented programming is and ends with a definition of return-oriented programming which I will use in the future posts. I will also take some of the recent discussions on Twitter into account which showed that even though I thought I did my history research pretty well, there were still some mailing list post missing from my time-line.

Why do we need return-oriented programming ?

Return-oriented programming is a technique which allows an attacker to execute code in the presence of the following defensive measures.

  • Non executable memory segments
  • Code signing

Where does return-oriented programming come from ?

Return-oriented programming is a recently coined term which describes a technique that has been developed in an iterative process in the security community. The terminology return-oriented programming is used for a subset of techniques which can be referred to as code reuse techniques. To understand where return-oriented programming comes from I show some of the milestones of the techniques history.

Buffer overflows were first publicly documented in the Computer Security Technology Planning Study in 1972 (Appendix 1. Incomplete Parameter Checking). To put this in perspective one must remember that even though we now know that this document was published at the time only a small circle of individuals had access to the document then.

A buffer overflow is, in the original form, a very simple error that is introduced if a function does not perform proper bounds checking for the accessed memory. Basically this means the function receives more input data than it can store. Assuming that the overflowed buffer was located on the stack, the attacker can now write a certain amount of data onto the stack where other variables and the return address might be located. Therefore the attacker can hijack the control flow of the current process and perform an arbitrary computation.

The first major attack which used a buffer overflow as the targeted vulnerability was the Morris worm in 1988. But it was not until the late 90s that major operating systems started to have any protection against buffer overflows. For Microsoft operating systems a form of protection against buffer overflows was only added after the Code-Red and Slammer worms had changed their security mindset in 2004.

One of the defensive measures which have been developed to defend against buffer overflows is the option to mark data memory segments as non-executable. This lead to the next evolutionary step towards return-oriented programming.

Return-into-library technique

The return-into-library technique is the root on which all return-oriented exploit approaches are based.

A return-into-library exploit works as follows: After the attacker has hijacked the control flow, a library function he chooses is executed. The attacker has made sure that the stack pointer points into a memory segment he controls. The attacker has set up the data in the memory segment in a way that it provides the right arguments to the library function of his choice. Through this he can execute a function with the needed arguments.

The technique of return-into-library exploits was initially presented publicly by Solar Designer in his 1997 posting to the Bugtraq mailing list. In this mail the groundwork for return-into-library exploiting was presented. The next milestone in the development of the technique was the Phrack article by Nergal which summarized the known techniques and broadened the attack vector by introducing esp shifting which allowed unlimited chaining of function calls to be used within return-into-library exploitation.

Borrowed code chunks technique

With the introduction of hardware-supported non-executable memory segments in combination with the support of 64 Bit CPUs the game changed again and traditional return-into-library exploits ceased to work. This was due to an ABI change which now required that the arguments to a function must be passed in registers rather than on the stack. Stealth developed a new approach that uses chunks of library functions instead of the call to the function itself to still be able to exploit buffer overflows on machines that employed the newly introduced defense. The approach is designed around the idea to locate instruction sequences which pop values from the stack into the right registers for function calls. By using his approach an attacker can use return-into-library exploits with the new ABI. A library which uses this technique in an automated fashion is DEPLib which has been developed by Pablo Sole. This library completely automates the return-oriented approach for Windows operating systems but it lacks support for loops and conditional branches (which is from a practical point of view negligible).

Return-oriented programming

The return-oriented programming technique broadens the attack vector even further by introducing loops and conditional branches for the return-oriented approach. The first academic work published in the field of return-oriented programming is Hovav Shacham’s ”The Geometry of Innocent Flesh on the Bone: Return-into-libc without function Calls (on the x86)” It describes the two major points which get addressed by return-oriented programming in contrast to the rest of the return-into-library techniques.

  • The return-into-library technique has no support for loops and conditional branching.
  • The removal of functions from libraries does not provide any security against return-oriented programming.

For the x86 the approach he uses to find suitable instruction sequences is based on the fact that the x86 uses a variable length instruction set. Therefore it is possible to search for single binary opcodes which alter control flow such as the return instruction (0xC3) and disassemble the binary from this position backwards. Because x86 uses a variable length instruction set the bytes before the return instruction can provide many possible instruction sequences. Shacham also defined the term gadget which describes one useful instruction sequence which performs one useful operation such as addition.

One assumption which Shacham made is that he thought a fixed length instruction set would make the application of return-oriented programming unfeasible. This was shown not to be the case by Ryan Roemers work which targeted the SPARC architecture which can be seen as the anti-thesis to the x86 architecture. One change which he needed to incorporate into his gadget set was that only memory could be used to pass information between gadgets. This is due to the way SPARC passes information in registers by shifting the register window.

The most practical work which has been published in the field of return-oriented programming is the recent work which targeted the AVC Advantage voting system. This work has provided proof that return-oriented programming is a valid tool for the offensive security researcher as no other technique would have been as useful against the Harvard-type architecture upon which the AVC Advantage is build.

What did we learn ?

Return-oriented programming is a recently coined term but the underlying technology has a long history which is based on the work of many security researchers. We have started with its roots in return-into-library attacks and showed how it evolved until today.

In the next post on return-oriented programming I will explain the first steps of my approach to make return-oriented programming platform independently.

Tim

staff++

2010-02-03

Hi everyone,

I am the new member on team zynamics. My name is Tim Kornau. I recently finished my Diploma Thesis at the Ruhr-University Bochum in IT-Security which covered the topic of return-oriented programming for the ARM architecture. I will post a summary of the thesis here in a follow-up blog post soon. For the impatient, you can already go ahead and read it –here-.

Primarily I will be working with Sebastian Porst on BinNavi and extending its capabilities even further. Right now I am working on the new MIPS REIL translator featured in the upcoming BinNavi 3.0 release.

If you have any questions about REIL, BinNavi, ARM, return-oriented programming or are just interested in sharing ideas about the topics, I am happy to talk to you.

I am looking forward to an awesome time at zynamics and a lot of new things to learn and do.