Combining Static Analysis with Probabilistic Models to Enable Market-Scale Analysis (POPL 2016 - Research Papers)

Who

Damien Octeau, Somesh Jha, Matthew Dering, Patrick McDaniel, Alexandre Bartel, Hongyu Li, Jacques Klein, Yves Le Traon

Track

POPL 2016 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 21 Jan 2016 14:20 - 14:45 at Grand Bay North - Track 1: Learning and verification Chair(s): David Monniaux

Abstract

Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and inter- application links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints. We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 10,928 applications in 24 minutes and triage them using our probabilistic approach. We find that over 97.3% of all 489 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.

Damien Octeau

University of Wisconsin and Pennsylvania State University

Somesh Jha

University of Wisconsin, Madison

Matthew Dering

Pennsylvania State University

Patrick McDaniel

Pennsylvania State University

Alexandre Bartel

Technical University Darmstadt

Hongyu Li

Rice University

Jacques Klein

University of Luxembourg

Yves Le Traon