These abstracts are for talks at this event.

NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.


minBT: A Minimal Core Calculus for Behavior Trees

Cynthia Li (Northeastern University)

Behavior trees are used in videogames to describe non-player character
behavior by stringing together conditionals on a sequence of actions:
go to the garden - if that succeeds, pick blueberries - while your
basket's not full, keep picking - if the player comes into view, say
hello! - when you have enough, go to the kitchen and bake a pie. minBT
is a minimal language based around these semantics. Because games
don't necessarily terminate, we're interested in reasoning about
playtraces, user input, and observable state change. It's a small
language, but when we reason about it in terms of continuations,
stateful effects, algebraic equivalences, and bisimulation, it becomes
surprisingly complex. We'd like to share a snapshot of that
complexity.


Fractional Permission Session Types

Selene Wu (Boston University)

Session types provide a formal system for modeling and verifying
message-passing concurrent systems. When constructed based on linear
logic, they provide a powerful guarantee of deadlock freedom and a
lack of data races. Unfortunately, this also carries the restriction
that channels be treated strictly linearly, which significantly
restricts the expressiveness of the resulting language. We seek to
improve expressiveness while maintaining these guarantees by
introducing a formal notion of an immutable operation and a system of
borrowing based on fractional permissions. This allows for the sharing
of channels between clients without the possibility of deadlocks or
data races. Finally, we demonstrate the efficacy on the language on
real world protocols, including a queue data structure and an auction
protocol.


Data Abstraction for Adversarial Security

Jared Pincus (Boston University)

Data abstraction is of growing interest as a means to state and
enforce security policies. In this ongoing work, we consider protocols
which use ML-style data abstraction to enforce absolute, rather than
probabilistic, security against well-typed adversaries; this includes
security games with multiple independent adversarial parties. To prove
security of such protocols, we employ the contextual equivalence
reasoning of ReLoC, a typed relational separation logic built atop
Iris and mechanized in Rocq. We demonstrate our approach by proving
secure a simple example protocol. Future work will pursue more complex
security games, with features such as concurrency, asynchronous
channels, and hybrid strategies of data abstraction + cryptography.


Towards Measure-Based Termination for STLC

Aaron Stump (Boston College)

Proving termination of beta-reduction for simply typed lambda calculus
using a measure function is an open problem attributed to Goedel.  In
this talk, I will explain why the problem is challenging and why a
solution would shed new light on the nature of typed reduction.  There
may even be some surprises.


Epistemic Logic for Polyglots

Luis Garcia (Northeastern University)

A pattern in programming is to stitch together code from disparate
programming languages to build a complex program. This pattern is a
manifestation of multilanguage computation, a subject concerning the
interoperation of languages in a greater system. We discuss ongoing
work in developing a type system for multilanguage computation in the
propositions-as-types discipline, drawing types from epistemic modal
logic.


Foundational Metatheory for Type-based Cryptographic Proofs

Conrad Zimmerman (Northeastern University)

Current tools and frameworks for proving security of cryptographic
protocols (EasyCrypt, Squirrel, UC, etc.) require extensive knowledge
about the necessary cryptographic assumptions, and often rely on
intricately-designed, difficult-to-extend proofs. We have developed a
framework that uses types to encode cryptographic properties (in the
computational model), building on techniques used in the Owl verifier.

Our framework uses the real-vs-ideal model, as in other cryptographic
frameworks such as Universal Composability, to encode basic
assumptions about cryptographic primitives. We then leverage
parametricity, information-flow control, CIU equivalence, and other
tools from PL to prove type soundness for primitives. Soundness
ensures that cryptographic properties, such as confidentiality or
non-malleability, hold for these primitives and every properly-typed
protocol that consumes them. Then security of a protocol can be proven
by a simple type checker.

In this talk we will describe what types mean in this setting and how
we can apply tools such as parametricity to the setting of
cryptographic proofs. We will then discuss how this type-based
reasoning enables modular and extensible cryptographic reasoning.
Finally, we will demonstrate how to apply these tools to formal proofs
of security for practical protocols.


Nested permissions for disjointness reasoning

Gavin Rohrer (Boston University)

Scalable verification of concurrent heap-manipulating programs
requires addressing the frame problem. A popular technique to address
framing is separation logic, which introduces a separating conjunction
to express that two pieces of memory are disjoint and can be
manipulated without interference. However, automating these proofs is
difficult due to the complexity of reasoning about the separating
conjunction. To address the challenge of automating frame reasoning,
this talk introduces a programming model called NPL that achieves
framing syntactically, in the program text, rather than in the program
logic. Instead of using a single global heap to model memory, NPL uses
a collection of potentially nested and globally disjoint partial
maps. This disjointness property enables NPL to provide
syntax-directed erasure to automatically transform an NPL program into
an equivalent program with a single global heap which can be compiled
and executed efficiently. The talk demonstrates the benefits of NPL by
showing how it enables verified construction of concurrent data
structures, like shared vector and Treiber stack, that require
fine-grained concurrency reasoning. NPL also helps reduce the size of
an existing proof for the Paxos distributed protocol by 33%.


Automated Reasoning for File System Interactions

Eric Zhao (Brown University)

Real-world software programs often interact with the file system to
store and retrieve persistent data. However, verification and
bug-finding techniques have traditionally under-served this domain,
making it challenging to develop strong confidence in the correctness
of such programs. This work presents, Accordion, a rigorous foundation
for fully automated, ahead-of-time reasoning for file
system-interacting programs based on sound and complete symbolic
execution. The system operates on an archetypal object language
designed to capture realistic programs and file system behaviors. By
maintaining a symbolic representation of all possible file system, it
can determine the conditions under which a file operation may fail by
exploiting domain-specific insights to achieve a precise yet finite
model. In this talk, we discuss challenges and insights related to the
design of Accordion.


Formalizing Delinearization on GPU Programs

Paul Maynard (UMass Boston), Tiago Cogumbeiro (UMass Boston)

Data-races are a major source of concurrency errors in GPU
programming. The vast majority of data-races are caused by array
accesses. The state of the art in static data-race freedom (DRF)
analysis relies on satisfiability modulo theories (SMT) solvers to
detect when arrays are accessed at equal indices, which is one of the
conditions that causes a data-race. However, DRF analysis relying on
SMT solvers is unable to handle all problem parameters
(e.g. abstracting over the number of threads). The root of this
limitation of SMT solvers is due to nonlinear arithmetic expressions
appearing in array indices, which cannot be tested for equality by
current solvers.

In this talk, we present a method for rendering nonlinear accesses
analyzable that adapts a compiler optimization technique called
delinearization. Our method turns a nonlinear access into a
multi-dimensional access composed of linear expressions that can be
analyzed by an SMT solver. We present a formalization of our
delinearization method, and the main result: a soundness theorem that
proves that delinearization preserves any and all data races in the
original code.


Property Based Testing to verify a pipelined CPU design (hardware)

Rishiyur S. Nikhil (Bluespec, Inc)

Property Based Testing of software is well established--- Haskell
QuickCheck was the pioneer; Wikipedia lists ports to about 40 other
programming languages.

In this talk we describe our experience using QuickCheck with
_hardware_, to verify "Fife", a 6-stage pipelined RISC-V CPU.  We
describe the setup for TestRIG (from the University of Cambridge): its
use of Haskell QuickCheck for RISC-V test generation and for
shrinking; use of the official RISC-V Formal Model written in the Sail
ISA Description Language; necessary configuration of TestRIG and the
Sail RISC-V Formal Model.

We describe necessary instrumentation for "Direct Instruction
Injection" and effect-reporting in any implementation being tested
(much of it reusable across implementations).

We describe how to deal with a complication: a pipelined CPU's
speculative instruction execution, due to branch and PC prediction,
and traps.

We describe proposed extensions to TestRIG to handle non-deterministic
events (such as interrupts) and micro-architectural properties (not
shared across implementations).

We also briefly describe BlueCheck, an implementation of QuickCheck
written in the hardware design language Bluespec BSV, so that
BlueCheck can itself be run in hardware (such as an FPGA) along with
the CPU design being tested, for much greater testing speed.


Caruca: Effective and Efficient Specification Mining for Opaque Software Components

Evangelos Lamprou (Brown University)

A wealth of state-of-the-art systems demonstrate impressive
improvements in performance, security, and reliability on programs
composed of opaque components, such as Unix shell commands. To reason
about commands, these systems require partial specifications. However,
creating such specifications is a manual, laborious, and error-prone
process, limiting the practicality of these systems. The talk will
present Caruca, a system for automatic specification mining for opaque
commands.

Specifically, I will describe how Caruca overcomes the challenge of
language diversity across commands by first instrumenting a large
language model to translate a command's user-facing documentation into
a structured invocation syntax. Using this representation, Caruca
explores the space of syntactically valid command invocations and
execution environments and concretely executes each
command-environment pair, interposing at the system-call and
filesystem level to extract key command properties such as
parallelizability and filesystem pre- and post-conditions. These
properties can be exported in multiple specification formats and are
immediately usable by existing systems.

Caruca already powers the full specifications for a state-of-the-art
static analysis tool. I will also present the kinds of specifications
it can generate for systems like PaSh (OSDI'22), Shellcheck, and
OpenAI's Codex AI assistant.


Orchestrating Program Synthesis: The Search for Better Search

Nada Amin (Harvard)

Can we get LLMs to write verified programs by orchestrating multiple
generation steps? I built systems to find out, and they taught me
lessons about when orchestration works and when it falls apart. I'll
show VerMCTS: it uses Monte Carlo Tree Search to complete programs
left-to-right, with the verifier guiding the search. This works!
Programs grow monotonically, progress is clear, and we know what to
try next. Then I'll demo Dafny Sketcher, which tries structured edits
instead. This... doesn't work as well. Without monotonic progress, how
do you decide which partial program to fix next? Furthermore, I hit a
wall with orchestration: weak LLMs couldn't solve the editing steps
reliably, and strong LLMs just solved everything with a simple repair
loop. So when does orchestration actually help? What makes a synthesis
problem amenable to multi-step search? And can we do better for
non-monotonic search spaces? I'll share what I've learned and what I'm
still trying to figure out.


Benchmarking LLM Agents for CUDA Data-Race Repair with Faial via MCP

Clark Ohlenbusch (UMass Boston)

GPU programming powers modern ML/AI and HPC, but massive parallelism
invites concurrency bugs such as data races. While static analysis can
reason about races without executing kernels, practical use on real
CUDA code faces the challenge of producing actionable feedback from
the alarms generated by static analysis tools.  Large-Language Models
(LLMs) are producing impressive strides in code synthesis and repair
of programming bugs. However, even with these advancements LLMs
struggle to identify and reliably repair concurrency bugs.  In this
talk, we assess if LLMs can improve program repair of concurrency
errors by leveraging a static verifier.

We present a new tool that exposes the GPU verifier Faial via a MCP
(Model Context Protocol) interface, making the verifier
programmatically discoverable and directly usable by Artificial
Intelligence (AI) coding agents and LLMs. We evaluate various LLMs in
a dataset of buggy programs. Our preliminary results show that
verifiers can improve the repair accuracy of LLMs.


Touring the Landscape of Contemporary Async/Await Systems

Gavin Gray (Brown University)

Multithreading with coarse-grained parallel computations has long been
the de-facto standard to take advantage of hardware resources,
however, the landscape of modern languages have adopted flavors of
asynchronous programming that is based on fine-grained suspendable
computations.

Async/await syntax has cropped up in many contemporary languages for
user-space concurrency. The promise of async/await systems is that
concurrent code is straightline and developers can “intuitively”
reason about code as if it were synchronous; making programs easier to
read and write, while reaping the benefits of concurrent code.

In this talk I will present preliminary work that describes the design
space of async systems. Our research outlines program properties
developers get based on the design decisions of the language, and the
semantic pitfalls that lurk behind the syntactic abstraction of
async/await.


Bridging the Gap: Precise Static Analysis of WebAssembly in a JavaScript World

Michelle Thalakottur (Northeastern University)

WebAssembly is a low-level bytecode that runs alongside JavaScript to
enable near-native performance on the web. Static analysis of
WebAssembly is essential to debloat binaries or to specialize them to
a JavaScript client. However, current static analysis techniques for
WebAssembly are hampered by a closed-world assumption, where the
WebAssembly module is analyzed in isolation from its host JavaScript
client. This approach ignores the rich type information present in the
JavaScript host, leading to significant imprecision, particularly in
resolving indirect function calls, which are prevalent in code
compiled from high-level languages.

This paper bridges the gap between the JavaScript and WebAssembly
worlds. We present a multi-language static analysis that leverages
information from compiler-generated JavaScript wrappers to improve the
precision of a WebAssembly binary analysis. Our key insight is that
the interoperation patterns between JavaScript and WebAssembly reveal
higher-level type information that is lost during compilation to
WebAssembly. We formalize this insight in a refinement type system for
WebAssembly that distinguishes pointers from numeric types, two
concepts conflated in WebAssembly’s i32 type.

We develop a tool, WasmBridge, that infers these refinement types by
generating and solving constraints derived from both the WebAssembly
binary and its JavaScript wrapper. We evaluate WasmBridge on a suite
of real-world applications compiled from Rust to WebAssembly. Our
approach reduces the number of edges in the generated call graph by an
average of 5% compared to state-of-the-art static-analysis tools. This
precision gain occurs at indirect call sites, where we reduce targets
by an average of 17% at 35.63% of all such sites and discover new
monomorphic call sites.

Furthermore, we demonstrate the practical impact of this precision on
a downstream analysis, showing that WasmBridge enables a dead-code
analysis to identify up to 20% more dead functions than a
state-of-the-art industry tool. Our work shows that even a lightweight
refinement, informed by the multi-language context, can substantially
improve the precision and utility of static analysis for WebAssembly.


Verified Derivation of Fast Tensor Programs

Amanda Liu (MIT)

High-performance applications in domains like vision, scientific
computing, and machine learning rely on optimizing tensor computations
by balancing compute and storage order. We present ATL (A Tensor
Language), a verified framework for optimizing tensor programs through
algebraic, source-level rewrites. To express and manipulate storage
order directly, ATL introduces combinators called reshape operators
that act as both algebraic transformations and compiler
directives. Each operator forms part of an adjoint pair, establishing
verified equivalences that can be composed to perform optimizations as
proven-correct rewrites. This approach allows us to express a wide
range of optimizations, including loop reordering, array layout
transformations, and even sparse tensor formats and vectorization, all
within a unified, verified framework. Generated kernels achieve
performance on par with systems like Halide and TACO, while
maintaining formal correctness guarantees.


From Ahead-of- to Just-in-Time and Back Again: Static Analysis for Unix Shell Programs

Lukas Lazarek (Brown University)

Shell programming is as prevalent as ever. It is also quite complex,
due to the structure of shell programs, their use of opaque software
components, and their complex interactions with the broader
environment. As a result, even when exercising an abundance of care,
shell developers discover devastating bugs in their programs only at
runtime: at best, shell programs going wrong crash the execution of a
long-running task; at worst, they silently corrupt the broader
environment in which they execute—affecting user data, modifying
system files, and rendering entire systems unusable. Could the shell’s
users enjoy the benefits of semantics-driven static analysis before
their programs’ execution—as offered by most other production
languages?


Last modified Friday, November 14th, 2025 1:00:31pm