These abstracts are for talks at this event.

NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.


Knuckledragger: A Low Barrier Proof Assistant

Philip Zucker (Draper Labs)

Many automated verification systems boil down to a sequence of SMT
queries with hand waving in between. Knuckledragger is an LCF-style
interactive theorem prover designed as a minimal Python library on top
of Z3. Knuckledragger can systematically link SMT queries in a Hilbert
style proof system. All of the Z3 Python bindings are directly exposed
and available because the terms, sorts, and formulas of Knuckledragger
are literally Z3 Python objects. The Proof objects of Knuckledragger
are a DAG of axiom instantiations and SMT queries. The basic design is
portable between host languages but leveraging python's ecosystem gets
a lot for free in terms of libraries, tooling, ML ecosystem, and
interactive Jupyter notebooks. The goal of Knuckledragger is to
support software/hardware verification, numerics, and other
applications of interest to the hackers, scientists, and engineers
that already are comfortable in Python. The project is available here
https://github.com/philzook58/knuckledragger.


Nested Family Polymorphism for Proofs

Anastasiya Kravchuk-Kirilyuk (Harvard University)

One of the most annoying things about code is that it evolves over
time. Old definitions expand, and new definitions spring up,
interacting with old code and annihilating its fragile balance. In
tandem, any formal proofs that depend on the changed code must evolve
as well, leading to ailments such as branch confusion (as in, “which
branch has the latest?”) and Sisyphus syndrome. We believe that
programming languages themselves should be solving the problems they
create: languages should have built-in safety features, making code
(and by extension, proofs) modular and reusable by construction. To
this end, our ongoing work builds on our earlier work — Persimmon — to
assess its capabilities as a proof system for modular, extensible
proofs. The original Persimmon is a functional language with type-safe
extensibility features, realized via nested family polymorphism. All
code in Persimmon is polymorphic to the enclosing family, meaning any
inherited or extended code automatically adapts for safe reuse in the
derived family. In our current work, we push the limits of what is
possible (and reasonable) to unleash the hidden potential of nested
family polymorphism for proofs. In this talk, we present a lightning
account of our findings to date.


Scaling GPU Data-Race Freedom With Delinearization

Paul Maynard (University of Massachusetts Boston)

GPU programming is the prevalent computing model in many fields of
computing, such as machine learning, artificial intelligence, and high
performance computing. GPUs offer the advantage of massive
parallelism, but run the risk of concurrency errors such as data
races. Static analysis can find bugs the high resource cost of dynamic
methods, which can be prohibitively expensive in the case of GPUs. A
major challenge in static analysis is handling intricate code, such as
certain high-dimensional data access patterns that are common in ML/AI
code.

In this talk, we present a novel approach that leverages compiler
optimization methods to scale data-race analysis on certain
programs. We show examples of higher-dimensional data that stump state
of the art analysis tools. We also introduce a technique to
reconstruct missing information to make the analysis more efficient
and scalable, by applying delinearization, a technique used in
compiler optimization. We introduce our prototype implementation of
this technique, and evaluate the benefits of this approach to reveal
improvements over the state of the art.


Regular language types for the streaming shell

Lukas Lazarek (Brown University), Zekai Li (Brown University)

The shell is a popular, powerful language for building
data-transformation pipelines by composing black-box components. This
composability is a double-edged sword: while it enables the
construction of complex pipelines from simple components, it also
makes it easy to create nonsensical compositions that silently produce
incorrect results---due to the untyped, byte-oriented nature of
inter-command communication. This talk presents Shtreams, a type
system for shell pipelines that uses regular languages to model
pipeline contents and catch composition errors before
execution. Shtreams's regular language types provide a flexible way to
describe command inputs and outputs, feel familiar within the shell
ecosystem, naturally support polymorphism, and enable example-based
error reporting---all while retaining the rich decision properties of
regular languages. As a result, Shtreams is able to automatically and
quickly analyze hundreds of real-world pipelines, catching hundreds of
bugs that no other state of the art system identifies.


Vernacular software

Eagon Meng (MIT), Daniel Jackson (MIT)

Where is the source of truth in software? From code, specifications,
and protocols to documentation and issue trackers, every project
sprawls across many mediums. Our troubles in software go far beyond
the sheer technical complexity of areas such as modern web
development, and often boil down to a single question: "What is even
happening?"

Donald Knuth's vision of literate programming provided an enticing but
often misunderstood approach: weave together a fundamentally
human-driven and narratively ordered source of truth with snippets of
code, and let our tools tangle that into something executable. This
doesn't mean to just interleave your codebase with documentation, and
certainly would not result in whatever is going on with out-of-order
cell evaluation in Jupyter notebooks.

Documentation seems to be a second-class citizen in software
today. And with the rise of vibe coding and LLMs throwing their
weight(s) around, can we really curb software complexity? But
wait... are prompts actually causal documentation for code, and are
they normalizing the idea of writing down your intention first?

We explore how concept design, a theory that aligns independent units
of functionality with user understanding, may enable a simpler vision:
a vernacular codebase (with or without LLMs) for end-to-end
development.


Modeling and Analyzing Performance Cost of GPU Kernels

Gregory Blike (University of Massachusetts Boston)

GPU programming has become the de facto choice for high performance
computation, especially with applications in Machine Learning, signal
processing, and physics simulation. A programmer’s failure to
understand the unique constraints of programming a GPU can result in
memory performance bugs. Two such bugs, bank conflicts and uncoalesced
accesses, occur when concurrent memory accesses by different threads
become serialized due to characteristics of the underlying
hardware. Static analysis of memory performance in GPU kernels can
help programmers identify memory performance bugs without executing
the programs, by generalizing the performance overheads with a
symbolic expression dependent on a kernel’s configuration, rather a
dynamic analysis which is limited to a fixed value dependent on a
particular execution’s data.

In this talk, we introduce a core calculus of warp-semantics which
abstracts the analysis of GPU kernels into a formalization that
accounts for GPU hardware memory constraints. Our abstraction enables
a sound over-approximation of memory performance penalties, and in
many cases, the exact cost. We also introduce and evaluate an
implementation of this abstraction which demonstrates improved
correctness, exactness, and range of analyzable kernels.


Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

Yuriy Brun (University of Masachusetts Amherst)

Formal verification using proof assistants, such as Coq, allows for
high-quality software. However, the verification process is expensive,
requiring significant expertise and manual effort to write
proofs. Recent work has explored automating proof synthesis using
machine learning, and even more recently, large language models
(LLMs), showing that retrieving relevant premises (such as lemmas and
definitions) is helpful for these models. We present Rango, a fully
automated proof synthesis tool for Coq that uses, not only relevant
premises but also similar proofs from the current project. Rango uses
retrieval augmentation at every step of the proof to automatically
determine which proofs and premises to include in the context of its
fine-tuned LLM. In this way, Rango adapts to the project and to the
evolving state of the proof. We create a new dataset, CoqStoq, of
2,205 open-source Coq projects from GitHub, which includes both
training data and a curated evaluation benchmark of well-maintained
projects. On this benchmark, Rango synthesizes 27.7% of the proofs,
which is 10% more proofs than prior state-of-the-art tool
Tactician. Our evaluation also shows that adding relevant proofs to
the context in Rango leads to a 45% increase in the number of theorems
proven.

This paper will appear in ICSE 2025, where it has won an ACM SIGSOFT Distinguished Paper Award.


Compositional Specification of Concurrent Objects Under Crashes

Arthur Oliveira Vale (Yale University)

In this talk, I will discuss the compositional specification of
parallel programs under full-system crashes, such as those that can be
caused by power outages or critical component failures.

In this context, crash-safety is an important property, as the main
functionality of some systems is resilience to crashes. Toward a
compositional verification approach for crash-safety under full-system
crashes, one observes that crashes propagate instantaneously to all
components across all levels of abstraction, even to unspecified
components, hindering compositionality. In the presence of
concurrency, a correctness criterion that addresses both crashes and
concurrency proves necessary. For this, several adaptations of
linearizability have been suggested, featuring different trade-offs
between complexity and expressiveness. The recently proposed
compositional linearizability framework shows that to achieve
compositionality with linearizability, both a locality and
observational refinement property are necessary. Despite that, no
linearizability criterion with crashes has been proven to support an
observational refinement property.

I will present our extension of compositional linearizability to the
full-system crashes setting, crash-aware linearizability. Then, I will
compare it with pre-existing linearizability criteria in this context,
such as strict and durable linearizability. And show a few examples of
how crash-aware linearizability may be used to specify crash-resilient
systems and libraries.


Encoding Hypotheses and Experiments with Helical

Emma Tosch (Northeastern University)

Have you ever tried building on someone else's research artifact? For
example, have you tried to reproduce a baseline or apply an existing
technique to a new domain? Which do you find more useful: the artifact
or the academic paper?

In the first half of this talk I will present Helical, a framework for
specifying (some!) hypotheses and experiments at a high level. Helical
consists of a hypothesis language, an experiment language, and a
configuration that captures the elements of an experimental design
that are fixed and those that are free to vary.

In the second half of this talk, I will ask the audience to work in
teams to encode hypothetical research projects and share with the
group what they have observed about this process.


QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning

Zhanna Kaufman (University of Massachusetts Amherst)

Formal verification is a promising method for producing reliable
software, but the difficulty of manually writing verification proofs
severely limits its utility in practice. Recent methods have automated
some proof synthesis by guiding a search through the proof space using
a theorem prover. Unfortunately, the theorem prover provides only the
crudest estimate of progress, resulting in effectively undirected
search. To address this problem, we create QEDCartographer, an
automated proof-synthesis tool that combines supervised and
reinforcement learning to more effectively explore the proof
space. QEDCartographer incorporates the proofs' branching structure,
enabling reward-free search and overcoming the sparse reward problem
inherent to formal verification. We evaluate QEDCartographer using the
CoqGym benchmark. QEDCartographer fully automatically proves 21.4% of
the test-set theorems. Previous search-based proof-synthesis tools
Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on
supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%,
respectively. Comparing to the most effective prior tool,
Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster,
on average over the theorems both tools prove. Together,
QEDCartographer and non-learning-based CoqHammer prove 31.8% of test
theorems. Our work demonstrates that reinforcement learning is a
fruitful research direction for improving proof-synthesis tools'
search mechanisms.

This work is appearing at ICSE 2025.


CoPy: Declarative Objected-oriented programming through copatterns

Adriano Corbelino (University of Masaschusetts Lowell)

Declarative programming and equational reasoning are often listed as
strengths of the functional paradigm. However, this can be achieved in
object-oriented languages with copattern matching and its dual,
pattern matching.  While pattern matching allows us to deconstruct
data and branch according to its shape, copattern matching allows us
to define the behavior of potentially infinite objects via a
collection of observations.Those observations are schematics of a
desired matching context that can be substituted by their outcome, an
expression.Copatterns can be nested and mixed with patterns, enabling
a rich and declarative way to refine our context's specifications.

This work extends Python to support a declarative style of defining
objects through copatterns and presents a new encoding of copatterns
in a terse target language, with regular classes and method
overloading, that we named Featherweight Python.

This talk will introduce this declarative style by presenting examples
of infinite objects (such as streams) and a series of arithmetic
evaluators to highlight the compositional capabilities of the system.


Norman Principles of Design + Empirical CS = Dynamic Visualization Tools in FSM

Andres M. Garced (Seton Hall University), Marco T. Morazán (Seton Hall University), Tijana Minic (University of Washington)

This talk discusses the design of dynamic visualization tools for FSM,
a functional domain-specific programming language embedded in Racket
for the Formal Languages and Automata Theory classroom. The design of
these tools are based on three pillars: the Norman principles of
effective design, empirical data to foster a smooth user experience,
and human-factors studies to measure user perceptions. We use the
Norman principles of effective design to reduce the extraneous
cognitive load on students. To ensure a smooth user experience, we
conducted empirical studies to reduce start-up time and response
times, which are impacted by image generation, volatile memory
demands, and garbage collector interference. These studies focus on
image generation techniques using parallelism, on the use of thunks to
reduce volatile memory demands, and on demanding frequent collections
by a incremental garbage collector. Finally, we present results of
human-factors studies conducted to determine if students enjoy and
find the tools useful.


Zero-Cost Lexical Effect Handlers

Cong Ma (University of Waterloo)

Exception handlers—and effect handlers more generally—are language
mechanisms for structured nonlocal control flow. A recent trend in
language-design research has introduced lexically scoped handlers,
which address a modularity problem with dynamic scoping. While
dynamically scoped handlers allows zero-overhead implementations when
no effects are raised, existing implementations of lexically scoped
handlers require programs to pay a cost just for having handlers in
the lexical context. In this talk, we present a novel implementation
strategy for lexically scoped handlers that satisfies the zero-cost
principle—a property otherwise met by most modern compilers supporting
dynamically scoped exceptions handlers. The key idea is a
type-directed compilation that emits information indicating how
handlers come into the lexical scope. This information guides the
runtime in walking the stack to locate the right handler. The
compilation scheme does not require any reified, lexically scoped
representation of handlers when no effects are raised. We capture the
essential aspects of this compilation scheme using core calculi. We
prove the correctness of the compilation scheme by showing that it
preserves semantics. Empirical results suggest that the implementation
strategy is effective.


Rust isn’t a Silver Bullet for Systems Research (Yet)

Kinan Dak Albab (Brown University)

Over the past decade, systems researchers have explored how they can
leverage language ``safety'' in Rust to design and build extensible
software systems with valuable security or isolation properties. These
``Rusty’’ systems aim to guarantee information flow control, isolation
of serverless functions, crash-resilient kernel extension modules,
among others. This talk explores why this approach is imperfect in
practice, resulting in serious violations of the guarantees Rusty
systems seek to provide.

We find that these failures are due to a mismatch between how Rusty
systems extend Rust's properties in new ways and how (hidden) unsafe
code breaks invariants the systems assume. We showcase this with two
realistic examples that cause violations in recently published Rusty
systems. We define a precise vocabulary for talking about safety in
Rust and the properties thought to hold in ``safe’’ code. Finally, we
explore approaches to catch and mitigate these mismatches, including
Sniffer, a new targeted code auditing tool that uncover such issues
assisted by static analysis.


From Linearity to Borrowing

Andrew Wagner (Northeastern University)

Linear type systems are powerful because they can statically ensure
the correct management of resources like memory, but they can also be
cumbersome to work with, since even benign uses of a resource require
that it be explicitly threaded through during computation. Borrowing,
as popularized by Rust, reduces this burden by allowing one to
temporarily disable certain resource permissions (e.g., deallocation
or mutation) in exchange for enabling certain structural permissions
(e.g., weakening or contraction). In particular, this mechanism spares
the borrower of a resource from having to explicitly return it to the
lender but nevertheless ensures that the lender eventually reclaims
ownership of the resource.

In this work, we elucidate the semantics of borrowing by starting with
a standard linear type system for ensuring safe manual memory
management in an untyped lambda calculus and gradually augmenting it
with immutable borrows, lexical lifetimes, reborrowing, and finally
mutable borrows. We prove semantic type soundness of our Borrow
Calculus (BoCa) using Borrow Logic (BoLo), a novel domain-specific
separation logic for borrowing. We establish the soundness of this
logic using a semantic model that additionally guarantees that our
calculus is terminating and free of memory leaks. We also show that
our Borrow Logic is robust enough to establish the semantic safety of
some syntactically ill-typed programs that temporarily break but
reestablish invariants.


Last modified Tuesday, April 29th, 2025 9:02:04am