Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and Programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user’s needs or are beyond their expertise.
In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from several online Excel help forums.
Thu 21 JanDisplayed time zone: Guadalajara, Mexico City, Monterrey change
10:30 - 12:10 | Track 2: Probabilistic and statistical analysisResearch Papers at Grand Bay South Chair(s): Aditya Nori Microsoft Research, UK | ||
10:30 25mTalk | Prophet: Automatic Patch Generation via Learning from Successful Patches Research Papers Media Attached | ||
10:55 25mTalk | Estimating types in binaries using predictive modeling Research Papers Omer Katz Technion, Israel Institute of Technology, Ran El-Yaniv Technion, Israel Institute of Technology, Eran Yahav Technion Media Attached File Attached | ||
11:20 25mTalk | Algorithmic Analysis of Qualitative and Quantitative Termination Problems for Affine Probabilistic Programs Research Papers Krishnendu Chatterjee IST Austria, Hongfei Fu IST Austria, Rouzbeh Hasheminezhad Sharif University, Petr Novotný IST Austria Media Attached | ||
11:45 25mTalk | Transforming Spreadsheet Data Types using Examples Research Papers Media Attached |