Write a Blog >>
Fri 22 Jan 2016 14:20 - 14:45 at Grand Bay North - Track 1: Synthesis Chair(s): Roberto Giacobazzi

We present a new approach for learning programs from noisy datasets. Our approach is based on two new concepts: a regularized program generator which predicts a candidate program based on a small sample of the entire dataset while avoiding overfitting, and a dataset sampler which carefully samples the dataset by leveraging the candidate program’s score on that dataset. The two components are connected in a continuous feedback-directed loop.

We show how to instantiate our learning approach to three settings: the setting where the dataset contains no noise (as assumed in today’s programming-by-example approaches), the setting where the dataset has a bound on the noise, and the setting where no bound on the noise is known. We then present several new kinds of program synthesizers which target a particular noise setting.

First, we introduce a novel regularized bitstream synthesizer that successfully predicts programs even in the presence of incorrect input/output examples. We show that the synthesizer can detect errors in the examples while combatting overfitting – a major problem in existing synthesis techniques. We also show how the approach can be used in a setting where the dataset grows dynamically via new examples (e.g., provided by a human).

Second, we present a novel technique for constructing statistical code completion systems. These are systems trained on massive datasets of open source programs, also known as “Big Code”. The key idea is to introduce a domain specific language (DSL) over trees and to learn functions in that DSL directly from the dataset. These learned functions then condition the predictions made by the system. This is a flexible and powerful technique which generalizes several existing works as we no longer need to decide a priori on what the prediction should be conditioned (another benefit is that the learned functions act as a natural mechanism for explaining the prediction to the programmer). As a result, our code completion system surpasses the prediction capabilities of state of the art, hard-wired systems.

Slides (popl16-slides.pdf)751KiB

Fri 22 Jan

POPL-2016-papers
14:20 - 16:00: Research Papers - Track 1: Synthesis at Grand Bay North
Chair(s): Roberto GiacobazziUniversity of Verona, Italy
POPL-2016-papers145346880000014:20 - 14:45
Talk
Veselin RaychevETH Zurich, Pavol BielikETH Zurich, Switzerland, Martin VechevETH Zurich, Andreas KrauseETH Zurich
Link to publication DOI Pre-print Media Attached File Attached
POPL-2016-papers145347030000014:45 - 15:10
Talk
James BornholtUniversity of Washington, Emina TorlakUniversity of Washington, Dan GrossmanUniversity of Washington, USA, Luis CezeUniversity of Washington, USA
Pre-print Media Attached
POPL-2016-papers145347180000015:10 - 15:35
Talk
Aws AlbarghouthiUniversity of Wisconsin–Madison, Isil DilligUniversity of Texas, Austin, Arie GurfinkelCarnegie Mellon University
Pre-print Media Attached
POPL-2016-papers145347330000015:35 - 16:00
Talk
Jonathan FranklePrinceton University, Peter-Michael OseraGrinnell College, David WalkerPrinceton University, Steve ZdancewicUniversity of Pennsylvania
Pre-print Media Attached File Attached