Learning Programs from Noisy Data (POPL 2016 - Research Papers)

Who

Veselin Raychev, Pavol Bielik, Martin Vechev, Andreas Krause

Track

POPL 2016 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 22 Jan 2016 14:20 - 14:45 at Grand Bay North - Track 1: Synthesis Chair(s): Roberto Giacobazzi

Abstract

We present a new approach for learning programs from noisy datasets. Our approach is based on two new concepts: a regularized program generator which predicts a candidate program based on a small sample of the entire dataset while avoiding overfitting, and a dataset sampler which carefully samples the dataset by leveraging the candidate program’s score on that dataset. The two components are connected in a continuous feedback-directed loop.

We show how to instantiate our learning approach to three settings: the setting where the dataset contains no noise (as assumed in today’s programming-by-example approaches), the setting where the dataset has a bound on the noise, and the setting where no bound on the noise is known. We then present several new kinds of program synthesizers which target a particular noise setting.

First, we introduce a novel regularized bitstream synthesizer that successfully predicts programs even in the presence of incorrect input/output examples. We show that the synthesizer can detect errors in the examples while combatting overfitting – a major problem in existing synthesis techniques. We also show how the approach can be used in a setting where the dataset grows dynamically via new examples (e.g., provided by a human).

Second, we present a novel technique for constructing statistical code completion systems. These are systems trained on massive datasets of open source programs, also known as “Big Code”. The key idea is to introduce a domain specific language (DSL) over trees and to learn functions in that DSL directly from the dataset. These learned functions then condition the predictions made by the system. This is a flexible and powerful technique which generalizes several existing works as we no longer need to decide a priori on what the prediction should be conditioned (another benefit is that the learned functions act as a natural mechanism for explaining the prediction to the programmer). As a result, our code completion system surpasses the prediction capabilities of state of the art, hard-wired systems.

Link to Publication

http://dl.acm.org/citation.cfm?id=2837671

Link to Preprint

http://www.srl.inf.ethz.ch/papers/popl16.pdf

DOI

https://doi.org/10.1145/2837614.2837671

File attachments

Slides (popl16-slides.pdf)	751KiB

Veselin Raychev

ETH Zurich

Bulgaria

Pavol Bielik

ETH Zurich, Switzerland

Slovakia

Martin Vechev

ETH Zurich

Bulgaria

Andreas Krause

ETH Zurich

Learning Programs from Noisy Data

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 22 Jan
Displayed time zone: Guadalajara, Mexico City, Monterrey change

14:20 - 16:00	Track 1: SynthesisResearch Papers at Grand Bay North Chair(s): Roberto Giacobazzi University of Verona, Italy

14:20 25m Talk		Learning Programs from Noisy Data Research Papers Veselin Raychev ETH Zurich, Pavol Bielik ETH Zurich, Switzerland, Martin Vechev ETH Zurich, Andreas Krause ETH Zurich Link to publication DOI Pre-print Media Attached File Attached
14:45 25m Talk		Optimizing Synthesis with Metasketches Research Papers James Bornholt University of Washington, Emina Torlak University of Washington, Dan Grossman University of Washington, USA, Luis Ceze University of Washington, USA Pre-print Media Attached
15:10 25m Talk		Maximal Specification Synthesis Research Papers Aws Albarghouthi University of Wisconsin–Madison, Işıl Dillig University of Texas, Austin, Arie Gurfinkel Carnegie Mellon University Pre-print Media Attached
15:35 25m Talk		Example-Directed Synthesis: A Type-Theoretic Interpretation Research Papers Jonathan Frankle Princeton University, Peter-Michael Osera Grinnell College, David Walker Princeton University, Steve Zdancewic University of Pennsylvania Pre-print Media Attached File Attached