Write a Blog >>

Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and Programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user’s needs or are beyond their expertise.

In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from several online Excel help forums.

Thu 21 Jan

Displayed time zone: Guadalajara, Mexico City, Monterrey change

10:30 - 12:10
Track 2: Probabilistic and statistical analysisResearch Papers at Grand Bay South
Chair(s): Aditya Nori Microsoft Research, UK
10:30
25m
Talk
Prophet: Automatic Patch Generation via Learning from Successful Patches
Research Papers
Fan Long MIT CSAIL, Martin C. Rinard Massachusetts Institute of Technology, USA
Media Attached
10:55
25m
Talk
Estimating types in binaries using predictive modeling
Research Papers
Omer Katz Technion, Israel Institute of Technology, Ran El-Yaniv Technion, Israel Institute of Technology, Eran Yahav Technion
Media Attached File Attached
11:20
25m
Talk
Algorithmic Analysis of Qualitative and Quantitative Termination Problems for Affine Probabilistic Programs
Research Papers
Krishnendu Chatterjee IST Austria, Hongfei Fu IST Austria, Rouzbeh Hasheminezhad Sharif University, Petr Novotný IST Austria
Media Attached
11:45
25m
Talk
Transforming Spreadsheet Data Types using Examples
Research Papers
Rishabh Singh Microsoft Research, Sumit Gulwani Microsoft Research
Media Attached