Write a Blog >>
Fri 22 Jan 2016 14:20 - 14:45 at Grand Bay North - Track 1: Synthesis Chair(s): Roberto Giacobazzi

We present a new approach for learning programs from noisy datasets. Our approach is based on two new concepts: a regularized program generator which predicts a candidate program based on a small sample of the entire dataset while avoiding overfitting, and a dataset sampler which carefully samples the dataset by leveraging the candidate program’s score on that dataset. The two components are connected in a continuous feedback-directed loop.

We show how to instantiate our learning approach to three settings: the setting where the dataset contains no noise (as assumed in today’s programming-by-example approaches), the setting where the dataset has a bound on the noise, and the setting where no bound on the noise is known. We then present several new kinds of program synthesizers which target a particular noise setting.

First, we introduce a novel regularized bitstream synthesizer that successfully predicts programs even in the presence of incorrect input/output examples. We show that the synthesizer can detect errors in the examples while combatting overfitting – a major problem in existing synthesis techniques. We also show how the approach can be used in a setting where the dataset grows dynamically via new examples (e.g., provided by a human).

Second, we present a novel technique for constructing statistical code completion systems. These are systems trained on massive datasets of open source programs, also known as “Big Code”. The key idea is to introduce a domain specific language (DSL) over trees and to learn functions in that DSL directly from the dataset. These learned functions then condition the predictions made by the system. This is a flexible and powerful technique which generalizes several existing works as we no longer need to decide a priori on what the prediction should be conditioned (another benefit is that the learned functions act as a natural mechanism for explaining the prediction to the programmer). As a result, our code completion system surpasses the prediction capabilities of state of the art, hard-wired systems.

Slides (popl16-slides.pdf)751KiB

Fri 22 Jan

Displayed time zone: Guadalajara, Mexico City, Monterrey change

14:20 - 16:00
Track 1: SynthesisResearch Papers at Grand Bay North
Chair(s): Roberto Giacobazzi University of Verona, Italy
14:20
25m
Talk
Learning Programs from Noisy Data
Research Papers
Veselin Raychev ETH Zurich, Pavol Bielik ETH Zurich, Switzerland, Martin Vechev ETH Zurich, Andreas Krause ETH Zurich
Link to publication DOI Pre-print Media Attached File Attached
14:45
25m
Talk
Optimizing Synthesis with Metasketches
Research Papers
James Bornholt University of Washington, Emina Torlak University of Washington, Dan Grossman University of Washington, USA, Luis Ceze University of Washington, USA
Pre-print Media Attached
15:10
25m
Talk
Maximal Specification Synthesis
Research Papers
Aws Albarghouthi University of Wisconsin–Madison, Isil Dillig University of Texas, Austin, Arie Gurfinkel Carnegie Mellon University
Pre-print Media Attached
15:35
25m
Talk
Example-Directed Synthesis: A Type-Theoretic Interpretation
Research Papers
Jonathan Frankle Princeton University, Peter-Michael Osera Grinnell College, David Walker Princeton University, Steve Zdancewic University of Pennsylvania
Pre-print Media Attached File Attached