Write a Blog >>

Reverse engineering is a crucial tool in mitigating vulnerabilities in binaries. As a lot of software is developed in C++ and other object-oriented languages, reverse engineering of object-oriented code is of critical importance. One of the major hurdles in reverse engineering binaries compiled from object-oriented code is the use of virtual functions, which results in dynamic dispatches. In the absence of debug information, any dynamic dispatch may seem to jump to many possible targets. This presents a significant challenge to a reverse engineer trying to understand the program flow.

We present a novel technique that allows us to statically determine the likely targets of virtual function calls. Our technique uses object tracelets – statically constructed sequences of operations performed on an object – to capture potential runtime behaviors of the object. Our analysis automatically pre-labels some of the object tracelets by relying on instances where the type of an object is known. The resulting type-labeled tracelets are then used to train a variable-order Markov model (VMM) for each type. We then use the resulting ensemble of VMMs over unlabeled tracelets to generate a ranking of their most likely types, from which we deduce the likely targets of dynamic dispatches. We have implemented our technique and evaluated it over real-world C++ binaries. Our evaluation shows that when there are multiple alternative targets, our approach can drastically reduce the number of targets that have to be considered by a reverse engineer.

Poster (POPL poster.pdf)275KiB