Write a Blog >>
Wed 20 Jan 2016 16:55 - 17:20 at Grand Bay North - Track 1: Language Design Chair(s): David Walker

Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: generalized multilevel linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a useful addition to probabilistic programming languages? And what do they mean? In the first half of the paper, we propose a core calculus of hierarchical linear regression, in which regression coefficients are themselves defined by nested regressions. We explain how our calculus captures the essence of the formula DSL found in R. We define a type system, and a semantics of the calculus by translation to probabilistic expressions interpreted as measures via the probability monad. We prove that well-typed regression formulas are explained as well-typed probabilistic expressions. We develop an equational theory for reasoning about regressions and prove soundness. Hence, we transform multilevel regressions to provably equivalent single-level regressions, formalizing for the first time an idea known by example from the statistical literature. In the second half, we describe the design and implementation of Fabular, a version of the Tabular schema-driven probabilistic programming language, enriched with formulas based on our regression calculus. We give examples to show that due to latent variables, Fabular can usefully express more models than formulas in R, and that due to formulas, Fabular allows surprisingly concise expression of existing models. We propose a new vectorised interpretation of formulas, with several significant use-cases, including a compact rendering of the large-scale Bayesian recommender system Matchbox. To the best of our knowledge, this is the first semantic description of the core ideas of R’s formula notation, the first development of a calculus of regression formulas, and the first demonstration of the benefits of composing regression formulas and latent variables in a probabilistic programming language.

Wed 20 Jan

Displayed time zone: Guadalajara, Mexico City, Monterrey change

16:30 - 17:45
Track 1: Language DesignResearch Papers at Grand Bay North
Chair(s): David Walker Princeton University
16:30
25m
Talk
Dependent Types and Multi-Monadic Effects in F*
Research Papers
Nikhil Swamy Microsoft Research, Cătălin Hriţcu INRIA Paris, Chantal Keller MSR-INRIA, Aseem Rastogi University of Maryland, College Park, Antoine Delignat-Lavaud INRIA, Simon Forest ENS, Karthikeyan Bhargavan INRIA, Cédric Fournet Microsoft Research, Pierre-Yves Strub IMDEA Software Institute, Markulf Kohlweiss Microsoft Research, Jean-Karim Zinzindohoué INRIA, Santiago Zanella-Béguelin Microsoft Research
Pre-print Media Attached
16:55
25m
Talk
Fabular: Regression Formulas as Probabilistic Programming
Research Papers
Johannes Borgström Uppsala University, Andrew D. Gordon Microsoft Research and University of Edinburgh, Long Ouyang Stanford University, Claudio Russo Microsoft Research, Adam Ścibior University of Cambridge, Marcin Szymczak University of Edinburgh
Media Attached
17:20
25m
Talk
Kleenex: Compiling Nondeterministic Transducers to Deterministic Streaming Transducers
Research Papers
Bjørn Bugge Grathwohl DIKU, University of Copenhagen, Fritz Henglein DIKU, Denmark, Ulrik Terp Rasmussen DIKU, University of Copenhagen, Kristoffer Aalund Søholm Jobindex, Denmark, Sebastian Paaske Tørholm Jobindex, Denmark
Media Attached