Estimating types in binaries using predictive modeling (POPL 2016 - Research Papers)

Who

Omer Katz, Ran El-Yaniv, Eran Yahav

Track

POPL 2016 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 21 Jan 2016 10:55 - 11:20 at Grand Bay South - Track 2: Probabilistic and statistical analysis Chair(s): Aditya Nori

Abstract

Reverse engineering is a crucial tool in mitigating vulnerabilities in binaries. As a lot of software is developed in C++ and other object-oriented languages, reverse engineering of object-oriented code is of critical importance. One of the major hurdles in reverse engineering binaries compiled from object-oriented code is the use of virtual functions, which results in dynamic dispatches. In the absence of debug information, any dynamic dispatch may seem to jump to many possible targets. This presents a significant challenge to a reverse engineer trying to understand the program flow.

We present a novel technique that allows us to statically determine the likely targets of virtual function calls. Our technique uses object tracelets – statically constructed sequences of operations performed on an object – to capture potential runtime behaviors of the object. Our analysis automatically pre-labels some of the object tracelets by relying on instances where the type of an object is known. The resulting type-labeled tracelets are then used to train a variable-order Markov model (VMM) for each type. We then use the resulting ensemble of VMMs over unlabeled tracelets to generate a ranking of their most likely types, from which we deduce the likely targets of dynamic dispatches. We have implemented our technique and evaluated it over real-world C++ binaries. Our evaluation shows that when there are multiple alternative targets, our approach can drastically reduce the number of targets that have to be considered by a reverse engineer.

File attachments

Poster (POPL poster.pdf)	275KiB

Omer Katz

Technion, Israel Institute of Technology

Israel

Ran El-Yaniv

Technion, Israel Institute of Technology

Eran Yahav

Technion