These abstracts are for talks at this event.

NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.

Fairness Testing

Sainyam Galhotra (UMass Amherst)

Today, software is making more automated decisions with societal
impact, and so it becomes critical to ensure that the software acts
fairly. For example, software determines who gets a loan and who gets
hired, and it is being used at every stage of the criminal justice
system, including computing risk-assessment scores that help determine
who goes to jail and who is set free. Toward that end, we define a
novel measure of software discrimination that captures causal
relationships between sensitive input attributes and software
behavior. Our approach, Themis, measures discrimination in software by
generating efficient, discrimination-testing test suites that,
importantly, do not need an oracle. We evaluate Themis on 20 software
systems, some of which come from prior work with explicit focus on
avoiding discrimination. We find that (1) Themis is effective at
discovering software discrimination, (2) state-of-the-art techniques
for removing discrimination from algorithms fail in many situations,
at times discriminating against as much as 98% of an input subdomain,
(3) Themis optimizations are effective at producing efficient test
suites for measuring discrimination, and (4) Themis is more efficient
on systems that exhibit more discrimination. We thus demonstrate that
fairness testing is a critical aspect of the software development
cycle in domains with possible discrimination and provide initial
tools for measuring software discrimination.

Decomposition Instead of Self-Composition for Proving the Absence of Timing Channels

Paul Gazzillo (Yale University)

We present a novel approach to proving the absence of timing channels.
The idea is to partition the program's execution traces in such a way
that each partition component is checked for timing attack resilience
by a time complexity analysis and that per-component resilience
implies the resilience of the whole program.  We construct a partition
by splitting the program traces at secret-independent branches.  This
ensures that any pair of traces with the same public input has a
component containing both traces.  Crucially, the per-component checks
can be normal safety properties expressed in terms of a single
execution.  Our approach is thus in contrast to prior approaches, such
as self-composition, that aim to reason about multiple k <= 2
executions at once.

We formalize the above as an approach called quotient partitioning,
generalized to any $k$-safety property, and prove it to be sound. A
key feature of our approach is a demand-driven partitioning strategy
that uses a regex-like notion called trails to identify sets of
execution traces, particularly those influenced by tainted (or secret)
data. We have applied our technique in a prototype implementation tool
called Blazer, based on WALA, PPL, and the brics automaton library. We
have proved timing-channel freedom of (or synthesized an attack
specification for) 24 programs written in Java bytecode, including 6
classic examples from the literature and 6 examples extracted from the
DARPA STAC challenge problems.

Artifacts for Semantics: An OCaml Experiment

Daniel Patterson (Northeastern University)

We present a brief experience report and demonstration of an artifact
we built for a semantics paper. The paper presents a multi-language
that incorporates a typed assembly language (TAL) and a simple
functional language where each can be embedded within the other.

The artifact is an online editor, type-checker, and stepper that
allowed readers to run examples from the paper, write their own
programs, and in general experiment with the multi-language. We think
it was successful both as a tool to aid readers, but it also helped us
to ensure our examples were correct and that there were not typos in
the presentation of the formalism. Given the low effort to construct
such an artifact (as compared to, for example, formalizing the results
in a theorem prover), we think this approach is worth considering for
other authors.

For more information, see

The talk will be a brief summary of the approach, and then a short
demo of the artifact.

Web-based Debugging for Free

Sam Baxter and Rachit Nigam (UMass Amherst)

Web-based programming environments lack many basic features that
programmers expect to find in desktop-based IDEs. For example, given a
non-terminating program, most environments simply crash the
browser. Some environments implement "infinite loop detectors", but
these sacrifice the ability to execute long-running, interactive
programs. The few exceptions that handle this correctly, such as Pyret
and WeScheme, have required tremendous effort to build and are tightly
coupled to specific programming languages.

We present Stopify, a new approach to building web-based programming
environments, that supports any language that compiles to JavaScript
and generates source maps. Stopify transforms the JavaScript produced
by an ordinary compiler and implements a runtime scheduler that adds
support for pausing, resuming, stepping, and break-pointing at the
source language level.

Following this approach, we present a web-based debugger that supports
three major languages.

Automated System Configuration Repair

Aaron Weiss (UMass Amherst)

Modern system configuration languages provide powerful abstractions
that make system administration easier. These languages are widely
used. For example, over 33,000 organizations use Puppet, including 75
of the Fortune 100. Still, configuration scripts often contain bugs
and the shell is the simplest way to diagnose them. Unfortunately,
fixing bugs from the shell is unsafe as it causes the system to drift
from the state specified in the script. Thus, configuration languages,
despite their advantages, force system administrators to give up the
simplicity and familiarity of the shell.

This talk will present a synthesis-based technique that allows
administrators to use configuration languages and the shell in
harmony. We allow administrators to fix errors from the shell and the
technique automatically repairs the high-level configuration script to
stay in sync. The approach (1) produces sound repairs, (2) produces
maintainable repairs, (3) ranks and presents multiple repairs when
relevant, and (4) supports any shell.

We've implemented our approach for Puppet, the most widely used system
configuration language. We evaluate our system on existing benchmarks
under 42 repair scenarios. We find that the top-ranked repair is
selected by humans 76% of the time and that the human-equivalent
repair is ranked 1.310 on average.

Static Detection of Event-based Races in Android Apps

Yongjian Hu (UC Riverside) and Iulian Neamtiu (New Jersey Institute of Technology)

Event-based races are the predominant source of concurrency errors in
Android apps. So far all the approaches for detecting event-based
races have been dynamic. Due to their dynamic nature, these approaches
suffer from coverage and false negative issues, and despite being
dynamic they still have a high rate of false positives.  We introduce
a static approach and tool, named SIERRA, for detecting Android
event-based races centered around a new concept of ``concurrency
action'' (that reifies threads, events/messages, system and user
actions) and statically-derived order (happens-before relation)
between actions. Establishing action order is complicated in Android,
and event-based systems in general, because of externally-orchestrated
control flow, use of callbacks, asynchronous tasks, and ad-hoc
synchronization.  Our approach introduces several key enablers for
inferring order relations statically: models which impose order among
lifecycle and GUI events; static analysis refinements, e.g.,
path-sensitive pointer analysis; and finally, on-demand path
sensitivity via backward symbolic execution to further rule out false
positives. We evaluate SIERRA on 194 Android apps. Of these, we chose
20 apps for manual analysis and comparison with a dynamic race
detector. Experimental results show that our approach is effective and
efficient, typically taking 1,230 seconds to analyze an app and
revealing 33 potential races. Moreover, SIERRA can refute false
positive races reported by the dynamic detector.  Our approach opens
the way for precise analysis and static event race detection in other
event-driven systems beyond mobile platforms.

Generic Coq Library for Certified Static Checking of Module-like Language Constructs

Julia Belyakova (Southern Federal University, Russia)

As a means of abstraction, many programming languages provide some
sort of module-like constructs. Examples are interfaces in
object-oriented languages, signatures in ML, type classes in Haskell.
All of these constructs share some common properties: they define
pairs of "interface" and "implementation" components (e.g. interfaces
and classes, signatures and modules), introduce namespaces, and obey
some high-level notions of well-definedness. For instance, we can
consider an "implementation" as well-defined if all of its members are
well-defined, and all members required by a corresponding "interface"
are defined.
At the same time, there are various ways to check that the members are
well-defined. For example, it may be enough to take into account only
previously defined members when checking the current one (as in
ML-module without mutual recursion); or it may be the case that all
members are considered at every point (as in Java-class). Also,
checking well-definedness of a self-contained class may be different
from checking of a class, which extends another one.
In this talk we present an ongoing work on the generic Coq library for
certified static checking of module-like constructs of different
flavours. It is intended to be used in certified
compilers/interpreters, which seek for both efficient representation
of language constructs and easiness-to-reason-about them. The library
defines three building blocks for several concrete notions of
modules. The first one, propositional, provides definitions of
well-definedness; the second one, computational, implements efficient
algorithms for checking modules; and the third one provides proofs of
correctness of algorithms (from the second block) with respect to the
propositional definitions (the first block).

Copattern-matchings and first-class observations at Work

Paul Laforgue (Northeastern University & Diderot University)

Infinite data structures are elegantly defined using
copattern-matching, a dual construction to pattern-matching that
expresses the outcomes of the observations of an infinite object.

We extended the OCaml programming language with copatterns, exploiting
the duality between pattern-matching and copattern-matching.  Provided
that a functional programming language has GADTs, every
copattern-matching can be transformed into a pattern-matching via a
purely local syntactic transformation, a macro.

The development of this extension leads us to a generalization of
previous calculus of copatterns: the introduction of first-class
observation queries.

During this talk, we will demonstrate how codata types and copatterns
help to understand and manipulate the infinite in a more natural way.
We will implement infinite objects, starting with streams and ending
with more sophisticated ones, such as the implementation of the game
of life using an infinite zipper.

Gradual Types for Objects Redux

Benjamin Chung (Northeastern University)

The ideal of gradual typing is to add types to untyped languages, a
motivation that researchers have taken up with great enthusiasm,
developing a wide range of approaches for combining sound types with
untyped code. However, these new techniques are typically presented
for a specific language and inherit all the complexity of that
language, muddying comparisons, making extension more difficult, and
obscuring potential implementations.

To clarify the differences between the approaches and to allow
extension to new languages, we present a common, object-oriented, core
calculus for gradual typing inspired by the CIL, called KafKa, and
translations to KafKa from several notable gradual type systems,
including Typed Racket, Thorn, and the Transient semantics in
Reticulated Python. By using a commodity VM inspired core calculus, we
simultaneously illustrate implementation approaches for gradually
typed programming languages, while simultaneously highlighting the
semantics of each approach in the translations.

Debugging Memory Safety Errors in Memory Managers

Karl Cronburg (Tufts University)

Debugging a memory manager is hard. While general purpose tools exist
to alleviate the debugging effort, such as GDB and Valgrind, we
believe such tools do not sufficiently support debugging of memory
managers. In this talk, I will present a new approach to debugging
memory managers based on a high-level specification language coupled
with binary instrumentation. Of particular importance to debugging a
memory manager is how it operates. Memory managers operate by
allocating memory according to a stratified protocol whereby memory
tran- sitions across different layers. For example, the Glasgow
Haskell Compiler's (GHC's) memory manager allocates memory into layers
named megablock, block, and object. At each layer memory managers also
bifurcate the memory into allocated and free states: a GHC block can
be either allocated or free.

Measuring Reticulated Python

Ben Greenman (Northeastern University)

Is sound gradual typing dead? In the context of Typed Racket, a
gradual type system that guarantees "classic" type soundness, the cost
of enforcing soundness at runtime is high.

Reticulated is an implementation of gradual typing for Python. It
provides an alternative soundness guarantee that requires less runtime
overhead to enforce.

This talk will explain Reticulated's notion of soundness, question its
practicality, and report our findings on Reticulated's performance.

Finding Optimal Schedules for Generational Garbage Collectors

Nicholas Jacek (UMass Amherst)

In a generational garbage collector, the cost of performing a
collection varies with the amount of data that is live and with the
specific objects that have been promoted - and thus is influenced by
the past sequence of collections.  By carefully choosing the times
when a collection is performed, the overall cost of running the
garbage collector can be reduced.  To this end, we present a dynamic
program that, given an after-the-fact trace of a program's execution
on a Java virtual machine, finds the schedule of major and minor
collections that has the minimum possible cost.  We then give results
that compare optimal and naive schedules on the DaCapo benchmarks
using a wide range of young space and old space sizes.  We also
compare with lower and upper bounds on the optimal that are
potentially even cheaper to compute.  Surprisingly we find that in
practice these bounds are often tight, which may offer further insight
into optimal GC cost.  We also hope to include some measurements of
the cost to compute optimal schedules using our dynamic program
written in the Julia programming language.

This is an update to our prior work.  Previously, we had used
reinforcement learning techniques to find near-optimal schedules.
However, these were only approximately optimal, since finding exactly
optimal schedules would take exponential time.  Our new reformulation
of the problem as a dynamic program allows us to find exactly optimal
schedules in feasible amounts of time, a significant improvement.
Work in progress that we expect to submit for journal publication this

Tracking the Provenance of Machine Learning Models

Nathan Ricci (Microsoft)

We present preliminary work on tracking the origins of machine
learning programs.  Our system takes unmodified Apache Spark programs,
applies dynamic instrumentation, and produces a data structure
recording what files, which functions, what parameters were used to
generate models(along with what other outputs the program nomrally
produces), with minimal over head.

Last modified Friday, June 2nd, 2017 8:50:14pm