New England Programming Languages and Systems Symposium Series (NEPLS): Abstracts

These abstracts are for talks at this event.

NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.

Autobahn: Using Genetic Algorithms to Infer Strictness Annotations

Diogenes Nunez (Tufts University)

Although laziness enables beautiful code, it comes with non-trivial
performance costs. The ghc compiler for Haskell has optimizations to
reduce those costs, but the optimizations are not sufficient. As a
result, Haskell also provides a variety of strictness annotations so
that users can indicate program points where an expression should be
evaluated eagerly. Skillful use of those annotations is a black art,
known only to expert Haskell programmers. In this paper, we introduce
AUTOBAHN, a tool that uses genetic algorithms to automatically infer
strictness annotations that improve program performance on
representative inputs. Users examine the suggested annotations for
soundness and can instruct AUTOBAHN to automatically produce modified
sources.

The talk will start detailing the problem. Then, we describe the
algorithm, a case study where we improve the performance of a
Haskell-written garbage collector simulator. Finally, we detail our
future work.

(Joint work with Yisu Remy Wang and Kathleen Fisher.)

Extraction of Cost Recurrences from Higher-Order Functional Programs

Norman Danner (Wesleyan University

For the past few years I have been working on a project with Dan
Licata to extract cost recurrences from functional programs. The basic
strategy is as follows.  We start with a monadic translation which
yields what is essentially a cost-annotated version of the original
program.  That is proceeded by a denotational semantics, with
different semantics giving different analyses.  Of course, it is not
quite that simple.  For example, it does not suffice to extract just
cost; we also need to extract information about an appropriate notion
of size (a reasonably standard one yields the recurrences for cost in
terms of size that we would expect).  I will give a brief overview of
the project and its goals, describe what we have accomplished
recently, and what we hope to accomplish in the (hopefully
not-too-distant) future.

Programming with Continuous Spaces

Ben Sherman (Massachusetts Institute of Technology)

Software that manipulates continuous concepts, such as space, time,
magnitude, and probability, is expanding its purview to
safety-critical domains such as control of autonomous vehicles. While
construction of formal proofs has succeeded in ensuring the safety and
correctness of software whose nature is discrete, continuous software
presents additional challenges. For instance, such software generally
uses unsound approximations to the continuous objects, such as
floating point rather than real numbers.

The remedy is to program with the continuous objects themselves! This
is possible, and indeed solves this issue, but raises new issues: the
principles for programming with continuous spaces have yet to be
elucidated. For instance, it is unclear how to make discrete decisions
over continuous spaces. To explore how to program with continuous
spaces, we are building a programming language, embedded within Coq,
whose types are topological spaces and expressions are continuous
maps. Because it's based on a constructive theory of topology known as
formal topology, the language has strong computational properties;
continuous maps can be run in a meaningful way. We introduce the
notion of an overlapping pattern match, where local expressions
defined on each open of an open cover can be stitched together to form
a global expression. This construction yields a way to make decisions
over continuous spaces and highlights the importance of nondeterminism
in cyber-physical systems.

(Joint work with Luke Sciarappa, Michael Carbin and Adam Chlipala.)

WoCMan: Programming with the Wisdom of the Crowd

Daniel W. Barowy (University of Massachusetts, Amherst)

Having a crowd perform estimates of a numeric value is the original
inspiration for the notion the wisdom of the "wisdom of the
crowd". However, because estimation operates over continuous values
like position or weight, quality control approaches based on consensus
are not applicable. We present WoCMan, a high-level programming
framework that automatically obtains high-quality crowdsourced
estimates of values. The WoCMan domain-specific language lets
programmers concisely write complex estimation tasks with a desired
level of confidence, confidence interval, and maximum budget. WoCMan's
runtime system implements a novel quality control algorithm that
automatically computes sample sizes and obtains high quality estimates
from the crowd at low cost. To evaluate WoCMan, we implement four
estimation applications, ranging from facial feature recognition to
calorie counting from photos of meals. The resulting programs concise
-- under 200 lines code -- and obtain high quality estimates from the
crowd quickly and inexpensively.

(Joint work with Emery D. Berger, Daniel G. Goldstein and Siddharth
Suri)

Self-Referential Mechanism for Dataflow Matrix Machines and Generalized Recurrent Neural Networks

Michael Bukatin (HERE North America LLC)

It is known for at least 30 years that recurrent neural nets (RNNs)
are Turing-universal. The understanding that neurons with linear
activation functions can be used as accumulators to form memory, and
that neurons with bilinear activation functions can be used to
modulate signals via multiplicative masks (gates) resulting in
conditional constructions also goes back at least that far.

Nevertheless, together with many other elegant and useful
Turing-universal systems, RNNs belong to the class of "esoteric
programming languages". People also tend to de-emphasize the role of
linear and bilinear neurons: e.g. modern LSTM and gated recurrent unit
networks are usually described as networks of sigmoid neurons
augmented with external memory and gating mechanisms instead of a more
straightforward description of them as conventional RNNs built from a
mixture of sigmoid, linear, and bilinear neurons.

Dataflow matrix machines (DMMs) are a simple generalization of RNNs.
While RNNs process streams of numbers, DMMs process streams of
representations of arbitrary vectors. This change considerably
increases their expressive power moving them closer to becoming a
realistic general-purpose programming system, while retaining the key
property of RNNs: the programs are represented by matrices of numbers.

In particular, DMMs can process streams of matrices defining the
topology and weights of the network itself resulting in highly
expressive self-referential facilities. We present a version of such a
self-referential mechanism based on a linear matrix transformer Self
acting as an accumulator of the network matrix in question and taking
additive updates from the other neurons in the network.

(Joint work with Steve Matthews (University of Warwick, UK) and Andrey
Radul ("Project Fluid", Cambridge MA).)

In-Depth Enforcement of Dynamic Integrity Taint Analysis

Sepehr Amir-Mohammadian (University of Vermont)

Dynamic taint analysis can be used as a defense against low-integrity
data in applications with untrusted user interfaces. An important
example is defense against XSS and injection attacks in programs with
web interfaces. Data sanitization is commonly used in this context,
and can be treated as a precondition for endorsement in a dynamic
integrity taint analysis. However, sanitization is often incomplete in
practice. We develop a model of dynamic integrity taint analysis for
Java that addresses imperfect sanitization with an in-depth
approach. To avoid false positives, results of sanitization are
endorsed for access control (aka prospective security), but are
tracked and logged for auditing and accountability (aka retrospective
security). We show how this heterogeneous prospective/retrospective
mechanism can be specified as a uniform policy, separate from code. We
then use this policy to establish correctness conditions for a program
rewriting algorithm that instruments code for the analysis.  The
rewriting itself is a model of existing, efficient Java taint analysis
tools.

(Joint work with Christian Skalka.)

Fission: Dynamic Tier-Splitting for JavaScript Web Applications

Rian Shambaugh (University of Massachusetts, Amherst)

In this talk, we present our ongoing work on Fission, a platform that
uses dynamic tier-splitting to transparently run a single JavaScript
program in the browser and on node.js. Fission uses faceted execution
(a form of dynamic information flow control) to run programs securely
and an abstract interpretation at runtime to partition programs
efficiently.  Fission places almost no restrictions on JavaScript.
For example, a single function can use both node.js and DOM APIs.
Fission also supports characteristic JavaScript features, such as
prototype inheritance and eval. Therefore, a key advantage of Fission
is that it enables existing JavaScript code to be reused and even
allows lots of serialization and communication boilerplate to be
deleted.

(Joint work with Jane Tangen (University of Massachusetts Amherst), 
Arjun Guha (University of Massachusetts Amherst) and
Jean-Baptiste Jeannin (Samsung Research).)

The Role of Constructive Approaches in Data Structure Generation and General Constraint-Solving

Ivan Kuraj (Massachusetts Institute of Technology)

Automated data structure generation is useful for tasks that explore a
large space of structured values, such as the space of test inputs in
automated testing and the space of solutions that satisfy given
constraints. Despite recent advances in automated testing and
constraint solving, efficient exhaustive generation of data structures
with complex invariants remains a significant
challenge. State-of-the-art approaches that use generators represented
with constraints achieve expressiveness for a variety of complex
structures, but suffer from significant performance penalties. In
turn, those based on constructive generators lack the necessary
expressiveness for generating complex structures.

This talk presents a new, both efficient and expressive, constructive
approach for automated generation and validation of complex
structures. The new approach is capable of exhaustive, incremental,
parallel, and memoized generation from not only finite but also
infinite domains, while providing features such as fine-grained
control over the generation, structure recognition (checking whether a
given structure can be produced by a generator), and lazy generation
(achieving exhaustive testing without generating all structures). The
approach achieves better performance and shorter specifications by up
to a few orders of magnitude compared to the state-of-the-art. In
addition, the talk explores extensions of such approach, future
directions for its integration with other approaches, and its
potential role in tasks such as testing complex systems, software
verification, and general constraint solving.

(Joint work with Daniel Jackson (MIT) and Viktor Kuncak (EPFL).)

Algebraic Databases

Ryan Wisnesky (Categorical Informatics)

In this talk we describe a new approach to expressive database query
languages that is a conceptual dual to LINQ.  We formalize database
schemas as algebraic (multi-sorted equational) theories obeying
certain restrictions.  Hence, a schema can be thought of as a type
theory, and morphisms of schemas are morphisms of type theories.  We
also formalize database instances as equational theories, whose
meaning is their initial term model.  Schemas form a category, and the
database instances on a schema X form a category, X-Inst.  Associated
with a schema morphism F : S -> T are three functors,
  Sigma_F : S-Inst -> T-Inst
(corresponding to substitution) along with a right adjoint
  Delta_F : T-Inst -> S-Inst
which in turn has a right adjoint
  Pi_F : S-Inst -> T-Inst.

These operations form an alternative basis for both database query
(where we compare them to relational algebra) and database integration
(where we compare them to embedded implicational dependencies).  We
say that our approach is a dual to LINQ because we embed programs (in
the guise of equational theories) into schemas, rather than embed
schemas (in the guise of types) into programs as LINQ does.  We
demonstrate a prototype information integration tool based on our
approach.

(Joint work with Patrick Schultz (MIT) and David Spivak (MIT).)

There are two-preprints associated with this topic as well as a project website: preprint 1, preprint 2, project Web site

Last modified Monday, October 3rd, 2016 11:48:32am