These abstracts are for talks at this event.

NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.

**Autobahn: Using Genetic Algorithms to Infer Strictness Annotations***Diogenes Nunez (Tufts University)*

Although laziness enables beautiful code, it comes with non-trivial performance costs. The ghc compiler for Haskell has optimizations to reduce those costs, but the optimizations are not sufficient. As a result, Haskell also provides a variety of strictness annotations so that users can indicate program points where an expression should be evaluated eagerly. Skillful use of those annotations is a black art, known only to expert Haskell programmers. In this paper, we introduce AUTOBAHN, a tool that uses genetic algorithms to automatically infer strictness annotations that improve program performance on representative inputs. Users examine the suggested annotations for soundness and can instruct AUTOBAHN to automatically produce modified sources. The talk will start detailing the problem. Then, we describe the algorithm, a case study where we improve the performance of a Haskell-written garbage collector simulator. Finally, we detail our future work. (Joint work with Yisu Remy Wang and Kathleen Fisher.)

**Extraction of Cost Recurrences from Higher-Order Functional Programs***Norman Danner (Wesleyan University*

For the past few years I have been working on a project with Dan Licata to extract cost recurrences from functional programs. The basic strategy is as follows. We start with a monadic translation which yields what is essentially a cost-annotated version of the original program. That is proceeded by a denotational semantics, with different semantics giving different analyses. Of course, it is not quite that simple. For example, it does not suffice to extract just cost; we also need to extract information about an appropriate notion of size (a reasonably standard one yields the recurrences for cost in terms of size that we would expect). I will give a brief overview of the project and its goals, describe what we have accomplished recently, and what we hope to accomplish in the (hopefully not-too-distant) future.

**Programming with Continuous Spaces***Ben Sherman (Massachusetts Institute of Technology)*

Software that manipulates continuous concepts, such as space, time, magnitude, and probability, is expanding its purview to safety-critical domains such as control of autonomous vehicles. While construction of formal proofs has succeeded in ensuring the safety and correctness of software whose nature is discrete, continuous software presents additional challenges. For instance, such software generally uses unsound approximations to the continuous objects, such as floating point rather than real numbers. The remedy is to program with the continuous objects themselves! This is possible, and indeed solves this issue, but raises new issues: the principles for programming with continuous spaces have yet to be elucidated. For instance, it is unclear how to make discrete decisions over continuous spaces. To explore how to program with continuous spaces, we are building a programming language, embedded within Coq, whose types are topological spaces and expressions are continuous maps. Because it's based on a constructive theory of topology known as formal topology, the language has strong computational properties; continuous maps can be run in a meaningful way. We introduce the notion of an overlapping pattern match, where local expressions defined on each open of an open cover can be stitched together to form a global expression. This construction yields a way to make decisions over continuous spaces and highlights the importance of nondeterminism in cyber-physical systems. (Joint work with Luke Sciarappa, Michael Carbin and Adam Chlipala.)

**WoCMan: Programming with the Wisdom of the Crowd***Daniel W. Barowy (University of Massachusetts, Amherst)*

Having a crowd perform estimates of a numeric value is the original inspiration for the notion the wisdom of the "wisdom of the crowd". However, because estimation operates over continuous values like position or weight, quality control approaches based on consensus are not applicable. We present WoCMan, a high-level programming framework that automatically obtains high-quality crowdsourced estimates of values. The WoCMan domain-specific language lets programmers concisely write complex estimation tasks with a desired level of confidence, confidence interval, and maximum budget. WoCMan's runtime system implements a novel quality control algorithm that automatically computes sample sizes and obtains high quality estimates from the crowd at low cost. To evaluate WoCMan, we implement four estimation applications, ranging from facial feature recognition to calorie counting from photos of meals. The resulting programs concise -- under 200 lines code -- and obtain high quality estimates from the crowd quickly and inexpensively. (Joint work with Emery D. Berger, Daniel G. Goldstein and Siddharth Suri)

**Self-Referential Mechanism for Dataflow Matrix Machines and Generalized Recurrent Neural Networks***Michael Bukatin (HERE North America LLC)*

It is known for at least 30 years that recurrent neural nets (RNNs) are Turing-universal. The understanding that neurons with linear activation functions can be used as accumulators to form memory, and that neurons with bilinear activation functions can be used to modulate signals via multiplicative masks (gates) resulting in conditional constructions also goes back at least that far. Nevertheless, together with many other elegant and useful Turing-universal systems, RNNs belong to the class of "esoteric programming languages". People also tend to de-emphasize the role of linear and bilinear neurons: e.g. modern LSTM and gated recurrent unit networks are usually described as networks of sigmoid neurons augmented with external memory and gating mechanisms instead of a more straightforward description of them as conventional RNNs built from a mixture of sigmoid, linear, and bilinear neurons. Dataflow matrix machines (DMMs) are a simple generalization of RNNs. While RNNs process streams of numbers, DMMs process streams of representations of arbitrary vectors. This change considerably increases their expressive power moving them closer to becoming a realistic general-purpose programming system, while retaining the key property of RNNs: the programs are represented by matrices of numbers. In particular, DMMs can process streams of matrices defining the topology and weights of the network itself resulting in highly expressive self-referential facilities. We present a version of such a self-referential mechanism based on a linear matrix transformer Self acting as an accumulator of the network matrix in question and taking additive updates from the other neurons in the network. (Joint work with Steve Matthews (University of Warwick, UK) and Andrey Radul ("Project Fluid", Cambridge MA).)

**In-Depth Enforcement of Dynamic Integrity Taint Analysis***Sepehr Amir-Mohammadian (University of Vermont)*

Dynamic taint analysis can be used as a defense against low-integrity data in applications with untrusted user interfaces. An important example is defense against XSS and injection attacks in programs with web interfaces. Data sanitization is commonly used in this context, and can be treated as a precondition for endorsement in a dynamic integrity taint analysis. However, sanitization is often incomplete in practice. We develop a model of dynamic integrity taint analysis for Java that addresses imperfect sanitization with an in-depth approach. To avoid false positives, results of sanitization are endorsed for access control (aka prospective security), but are tracked and logged for auditing and accountability (aka retrospective security). We show how this heterogeneous prospective/retrospective mechanism can be specified as a uniform policy, separate from code. We then use this policy to establish correctness conditions for a program rewriting algorithm that instruments code for the analysis. The rewriting itself is a model of existing, efficient Java taint analysis tools. (Joint work with Christian Skalka.)

**Fission: Dynamic Tier-Splitting for JavaScript Web Applications***Rian Shambaugh (University of Massachusetts, Amherst)*

In this talk, we present our ongoing work on Fission, a platform that uses dynamic tier-splitting to transparently run a single JavaScript program in the browser and on node.js. Fission uses faceted execution (a form of dynamic information flow control) to run programs securely and an abstract interpretation at runtime to partition programs efficiently. Fission places almost no restrictions on JavaScript. For example, a single function can use both node.js and DOM APIs. Fission also supports characteristic JavaScript features, such as prototype inheritance and eval. Therefore, a key advantage of Fission is that it enables existing JavaScript code to be reused and even allows lots of serialization and communication boilerplate to be deleted. (Joint work with Jane Tangen (University of Massachusetts Amherst), Arjun Guha (University of Massachusetts Amherst) and Jean-Baptiste Jeannin (Samsung Research).)

**The Role of Constructive Approaches in Data Structure Generation and General Constraint-Solving***Ivan Kuraj (Massachusetts Institute of Technology)*

Automated data structure generation is useful for tasks that explore a large space of structured values, such as the space of test inputs in automated testing and the space of solutions that satisfy given constraints. Despite recent advances in automated testing and constraint solving, efficient exhaustive generation of data structures with complex invariants remains a significant challenge. State-of-the-art approaches that use generators represented with constraints achieve expressiveness for a variety of complex structures, but suffer from significant performance penalties. In turn, those based on constructive generators lack the necessary expressiveness for generating complex structures. This talk presents a new, both efficient and expressive, constructive approach for automated generation and validation of complex structures. The new approach is capable of exhaustive, incremental, parallel, and memoized generation from not only finite but also infinite domains, while providing features such as fine-grained control over the generation, structure recognition (checking whether a given structure can be produced by a generator), and lazy generation (achieving exhaustive testing without generating all structures). The approach achieves better performance and shorter specifications by up to a few orders of magnitude compared to the state-of-the-art. In addition, the talk explores extensions of such approach, future directions for its integration with other approaches, and its potential role in tasks such as testing complex systems, software verification, and general constraint solving. (Joint work with Daniel Jackson (MIT) and Viktor Kuncak (EPFL).)

**Algebraic Databases***Ryan Wisnesky (Categorical Informatics)*

In this talk we describe a new approach to expressive database query languages that is a conceptual dual to LINQ. We formalize database schemas as algebraic (multi-sorted equational) theories obeying certain restrictions. Hence, a schema can be thought of as a type theory, and morphisms of schemas are morphisms of type theories. We also formalize database instances as equational theories, whose meaning is their initial term model. Schemas form a category, and the database instances on a schema X form a category, X-Inst. Associated with a schema morphism F : S -> T are three functors, Sigma_F : S-Inst -> T-Inst (corresponding to substitution) along with a right adjoint Delta_F : T-Inst -> S-Inst which in turn has a right adjoint Pi_F : S-Inst -> T-Inst. These operations form an alternative basis for both database query (where we compare them to relational algebra) and database integration (where we compare them to embedded implicational dependencies). We say that our approach is a dual to LINQ because we embed programs (in the guise of equational theories) into schemas, rather than embed schemas (in the guise of types) into programs as LINQ does. We demonstrate a prototype information integration tool based on our approach. (Joint work with Patrick Schultz (MIT) and David Spivak (MIT).)

There are two-preprints associated with this topic as well as a project website: preprint 1, preprint 2, project Web site

Last modified Monday, October 3rd, 2016 11:48:32am |