Write a Blog >>

Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and Programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user’s needs or are beyond their expertise.

In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from several online Excel help forums.

Thu 21 Jan
Times are displayed in time zone: (GMT-05:00) Guadalajara, Mexico City, Monterrey change

10:30 - 12:10: Research Papers - Track 2: Probabilistic and statistical analysis at Grand Bay South
Chair(s): Aditya NoriMicrosoft Research, UK
POPL-2016-papers10:30 - 10:55
Fan LongMIT CSAIL, Martin RinardMassachusetts Institute of Technology, USA
Media Attached
POPL-2016-papers10:55 - 11:20
Omer KatzTechnion, Israel Institute of Technology, Ran El-YanivTechnion, Israel Institute of Technology, Eran YahavTechnion
Media Attached File Attached
POPL-2016-papers11:20 - 11:45
Krishnendu ChatterjeeIST Austria, Hongfei FuIST Austria, Rouzbeh HasheminezhadSharif University, Petr NovotnyIST Austria
Media Attached
POPL-2016-papers11:45 - 12:10
Rishabh SinghMicrosoft Research, Sumit GulwaniMicrosoft Research
Media Attached