These abstracts are for talks at this event.
NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.
minBT: A Minimal Core Calculus for Behavior Trees
Cynthia Li (Northeastern University)
Behavior trees are used in videogames to describe non-player character behavior by stringing together conditionals on a sequence of actions: go to the garden - if that succeeds, pick blueberries - while your basket's not full, keep picking - if the player comes into view, say hello! - when you have enough, go to the kitchen and bake a pie. minBT is a minimal language based around these semantics. Because games don't necessarily terminate, we're interested in reasoning about playtraces, user input, and observable state change. It's a small language, but when we reason about it in terms of continuations, stateful effects, algebraic equivalences, and bisimulation, it becomes surprisingly complex. We'd like to share a snapshot of that complexity.
Fractional Permission Session Types
Selene Wu (Boston University)
Session types provide a formal system for modeling and verifying message-passing concurrent systems. When constructed based on linear logic, they provide a powerful guarantee of deadlock freedom and a lack of data races. Unfortunately, this also carries the restriction that channels be treated strictly linearly, which significantly restricts the expressiveness of the resulting language. We seek to improve expressiveness while maintaining these guarantees by introducing a formal notion of an immutable operation and a system of borrowing based on fractional permissions. This allows for the sharing of channels between clients without the possibility of deadlocks or data races. Finally, we demonstrate the efficacy on the language on real world protocols, including a queue data structure and an auction protocol.
Data Abstraction for Adversarial Security
Jared Pincus (Boston University)
Data abstraction is of growing interest as a means to state and enforce security policies. In this ongoing work, we consider protocols which use ML-style data abstraction to enforce absolute, rather than probabilistic, security against well-typed adversaries; this includes security games with multiple independent adversarial parties. To prove security of such protocols, we employ the contextual equivalence reasoning of ReLoC, a typed relational separation logic built atop Iris and mechanized in Rocq. We demonstrate our approach by proving secure a simple example protocol. Future work will pursue more complex security games, with features such as concurrency, asynchronous channels, and hybrid strategies of data abstraction + cryptography.
Towards Measure-Based Termination for STLC
Aaron Stump (Boston College)
Proving termination of beta-reduction for simply typed lambda calculus using a measure function is an open problem attributed to Goedel. In this talk, I will explain why the problem is challenging and why a solution would shed new light on the nature of typed reduction. There may even be some surprises.
Epistemic Logic for Polyglots
Luis Garcia (Northeastern University)
A pattern in programming is to stitch together code from disparate programming languages to build a complex program. This pattern is a manifestation of multilanguage computation, a subject concerning the interoperation of languages in a greater system. We discuss ongoing work in developing a type system for multilanguage computation in the propositions-as-types discipline, drawing types from epistemic modal logic.
Foundational Metatheory for Type-based Cryptographic Proofs
Conrad Zimmerman (Northeastern University)
Current tools and frameworks for proving security of cryptographic protocols (EasyCrypt, Squirrel, UC, etc.) require extensive knowledge about the necessary cryptographic assumptions, and often rely on intricately-designed, difficult-to-extend proofs. We have developed a framework that uses types to encode cryptographic properties (in the computational model), building on techniques used in the Owl verifier. Our framework uses the real-vs-ideal model, as in other cryptographic frameworks such as Universal Composability, to encode basic assumptions about cryptographic primitives. We then leverage parametricity, information-flow control, CIU equivalence, and other tools from PL to prove type soundness for primitives. Soundness ensures that cryptographic properties, such as confidentiality or non-malleability, hold for these primitives and every properly-typed protocol that consumes them. Then security of a protocol can be proven by a simple type checker. In this talk we will describe what types mean in this setting and how we can apply tools such as parametricity to the setting of cryptographic proofs. We will then discuss how this type-based reasoning enables modular and extensible cryptographic reasoning. Finally, we will demonstrate how to apply these tools to formal proofs of security for practical protocols.
Nested permissions for disjointness reasoning
Gavin Rohrer (Boston University)
Scalable verification of concurrent heap-manipulating programs requires addressing the frame problem. A popular technique to address framing is separation logic, which introduces a separating conjunction to express that two pieces of memory are disjoint and can be manipulated without interference. However, automating these proofs is difficult due to the complexity of reasoning about the separating conjunction. To address the challenge of automating frame reasoning, this talk introduces a programming model called NPL that achieves framing syntactically, in the program text, rather than in the program logic. Instead of using a single global heap to model memory, NPL uses a collection of potentially nested and globally disjoint partial maps. This disjointness property enables NPL to provide syntax-directed erasure to automatically transform an NPL program into an equivalent program with a single global heap which can be compiled and executed efficiently. The talk demonstrates the benefits of NPL by showing how it enables verified construction of concurrent data structures, like shared vector and Treiber stack, that require fine-grained concurrency reasoning. NPL also helps reduce the size of an existing proof for the Paxos distributed protocol by 33%.
Automated Reasoning for File System Interactions
Eric Zhao (Brown University)
Real-world software programs often interact with the file system to store and retrieve persistent data. However, verification and bug-finding techniques have traditionally under-served this domain, making it challenging to develop strong confidence in the correctness of such programs. This work presents, Accordion, a rigorous foundation for fully automated, ahead-of-time reasoning for file system-interacting programs based on sound and complete symbolic execution. The system operates on an archetypal object language designed to capture realistic programs and file system behaviors. By maintaining a symbolic representation of all possible file system, it can determine the conditions under which a file operation may fail by exploiting domain-specific insights to achieve a precise yet finite model. In this talk, we discuss challenges and insights related to the design of Accordion.
Formalizing Delinearization on GPU Programs
Paul Maynard (UMass Boston), Tiago Cogumbeiro (UMass Boston)
Data-races are a major source of concurrency errors in GPU programming. The vast majority of data-races are caused by array accesses. The state of the art in static data-race freedom (DRF) analysis relies on satisfiability modulo theories (SMT) solvers to detect when arrays are accessed at equal indices, which is one of the conditions that causes a data-race. However, DRF analysis relying on SMT solvers is unable to handle all problem parameters (e.g. abstracting over the number of threads). The root of this limitation of SMT solvers is due to nonlinear arithmetic expressions appearing in array indices, which cannot be tested for equality by current solvers. In this talk, we present a method for rendering nonlinear accesses analyzable that adapts a compiler optimization technique called delinearization. Our method turns a nonlinear access into a multi-dimensional access composed of linear expressions that can be analyzed by an SMT solver. We present a formalization of our delinearization method, and the main result: a soundness theorem that proves that delinearization preserves any and all data races in the original code.
Property Based Testing to verify a pipelined CPU design (hardware)
Rishiyur S. Nikhil (Bluespec, Inc)
Property Based Testing of software is well established--- Haskell QuickCheck was the pioneer; Wikipedia lists ports to about 40 other programming languages. In this talk we describe our experience using QuickCheck with _hardware_, to verify "Fife", a 6-stage pipelined RISC-V CPU. We describe the setup for TestRIG (from the University of Cambridge): its use of Haskell QuickCheck for RISC-V test generation and for shrinking; use of the official RISC-V Formal Model written in the Sail ISA Description Language; necessary configuration of TestRIG and the Sail RISC-V Formal Model. We describe necessary instrumentation for "Direct Instruction Injection" and effect-reporting in any implementation being tested (much of it reusable across implementations). We describe how to deal with a complication: a pipelined CPU's speculative instruction execution, due to branch and PC prediction, and traps. We describe proposed extensions to TestRIG to handle non-deterministic events (such as interrupts) and micro-architectural properties (not shared across implementations). We also briefly describe BlueCheck, an implementation of QuickCheck written in the hardware design language Bluespec BSV, so that BlueCheck can itself be run in hardware (such as an FPGA) along with the CPU design being tested, for much greater testing speed.
Caruca: Effective and Efficient Specification Mining for Opaque Software Components
Evangelos Lamprou (Brown University)
A wealth of state-of-the-art systems demonstrate impressive improvements in performance, security, and reliability on programs composed of opaque components, such as Unix shell commands. To reason about commands, these systems require partial specifications. However, creating such specifications is a manual, laborious, and error-prone process, limiting the practicality of these systems. The talk will present Caruca, a system for automatic specification mining for opaque commands. Specifically, I will describe how Caruca overcomes the challenge of language diversity across commands by first instrumenting a large language model to translate a command's user-facing documentation into a structured invocation syntax. Using this representation, Caruca explores the space of syntactically valid command invocations and execution environments and concretely executes each command-environment pair, interposing at the system-call and filesystem level to extract key command properties such as parallelizability and filesystem pre- and post-conditions. These properties can be exported in multiple specification formats and are immediately usable by existing systems. Caruca already powers the full specifications for a state-of-the-art static analysis tool. I will also present the kinds of specifications it can generate for systems like PaSh (OSDI'22), Shellcheck, and OpenAI's Codex AI assistant.
Orchestrating Program Synthesis: The Search for Better Search
Nada Amin (Harvard)
Can we get LLMs to write verified programs by orchestrating multiple generation steps? I built systems to find out, and they taught me lessons about when orchestration works and when it falls apart. I'll show VerMCTS: it uses Monte Carlo Tree Search to complete programs left-to-right, with the verifier guiding the search. This works! Programs grow monotonically, progress is clear, and we know what to try next. Then I'll demo Dafny Sketcher, which tries structured edits instead. This... doesn't work as well. Without monotonic progress, how do you decide which partial program to fix next? Furthermore, I hit a wall with orchestration: weak LLMs couldn't solve the editing steps reliably, and strong LLMs just solved everything with a simple repair loop. So when does orchestration actually help? What makes a synthesis problem amenable to multi-step search? And can we do better for non-monotonic search spaces? I'll share what I've learned and what I'm still trying to figure out.
Benchmarking LLM Agents for CUDA Data-Race Repair with Faial via MCP
Clark Ohlenbusch (UMass Boston)
GPU programming powers modern ML/AI and HPC, but massive parallelism invites concurrency bugs such as data races. While static analysis can reason about races without executing kernels, practical use on real CUDA code faces the challenge of producing actionable feedback from the alarms generated by static analysis tools. Large-Language Models (LLMs) are producing impressive strides in code synthesis and repair of programming bugs. However, even with these advancements LLMs struggle to identify and reliably repair concurrency bugs. In this talk, we assess if LLMs can improve program repair of concurrency errors by leveraging a static verifier. We present a new tool that exposes the GPU verifier Faial via a MCP (Model Context Protocol) interface, making the verifier programmatically discoverable and directly usable by Artificial Intelligence (AI) coding agents and LLMs. We evaluate various LLMs in a dataset of buggy programs. Our preliminary results show that verifiers can improve the repair accuracy of LLMs.
Touring the Landscape of Contemporary Async/Await Systems
Gavin Gray (Brown University)
Multithreading with coarse-grained parallel computations has long been the de-facto standard to take advantage of hardware resources, however, the landscape of modern languages have adopted flavors of asynchronous programming that is based on fine-grained suspendable computations. Async/await syntax has cropped up in many contemporary languages for user-space concurrency. The promise of async/await systems is that concurrent code is straightline and developers can “intuitively” reason about code as if it were synchronous; making programs easier to read and write, while reaping the benefits of concurrent code. In this talk I will present preliminary work that describes the design space of async systems. Our research outlines program properties developers get based on the design decisions of the language, and the semantic pitfalls that lurk behind the syntactic abstraction of async/await.
Bridging the Gap: Precise Static Analysis of WebAssembly in a JavaScript World
Michelle Thalakottur (Northeastern University)
WebAssembly is a low-level bytecode that runs alongside JavaScript to enable near-native performance on the web. Static analysis of WebAssembly is essential to debloat binaries or to specialize them to a JavaScript client. However, current static analysis techniques for WebAssembly are hampered by a closed-world assumption, where the WebAssembly module is analyzed in isolation from its host JavaScript client. This approach ignores the rich type information present in the JavaScript host, leading to significant imprecision, particularly in resolving indirect function calls, which are prevalent in code compiled from high-level languages. This paper bridges the gap between the JavaScript and WebAssembly worlds. We present a multi-language static analysis that leverages information from compiler-generated JavaScript wrappers to improve the precision of a WebAssembly binary analysis. Our key insight is that the interoperation patterns between JavaScript and WebAssembly reveal higher-level type information that is lost during compilation to WebAssembly. We formalize this insight in a refinement type system for WebAssembly that distinguishes pointers from numeric types, two concepts conflated in WebAssembly’s i32 type. We develop a tool, WasmBridge, that infers these refinement types by generating and solving constraints derived from both the WebAssembly binary and its JavaScript wrapper. We evaluate WasmBridge on a suite of real-world applications compiled from Rust to WebAssembly. Our approach reduces the number of edges in the generated call graph by an average of 5% compared to state-of-the-art static-analysis tools. This precision gain occurs at indirect call sites, where we reduce targets by an average of 17% at 35.63% of all such sites and discover new monomorphic call sites. Furthermore, we demonstrate the practical impact of this precision on a downstream analysis, showing that WasmBridge enables a dead-code analysis to identify up to 20% more dead functions than a state-of-the-art industry tool. Our work shows that even a lightweight refinement, informed by the multi-language context, can substantially improve the precision and utility of static analysis for WebAssembly.
Verified Derivation of Fast Tensor Programs
Amanda Liu (MIT)
High-performance applications in domains like vision, scientific computing, and machine learning rely on optimizing tensor computations by balancing compute and storage order. We present ATL (A Tensor Language), a verified framework for optimizing tensor programs through algebraic, source-level rewrites. To express and manipulate storage order directly, ATL introduces combinators called reshape operators that act as both algebraic transformations and compiler directives. Each operator forms part of an adjoint pair, establishing verified equivalences that can be composed to perform optimizations as proven-correct rewrites. This approach allows us to express a wide range of optimizations, including loop reordering, array layout transformations, and even sparse tensor formats and vectorization, all within a unified, verified framework. Generated kernels achieve performance on par with systems like Halide and TACO, while maintaining formal correctness guarantees.
From Ahead-of- to Just-in-Time and Back Again: Static Analysis for Unix Shell Programs
Lukas Lazarek (Brown University)
Shell programming is as prevalent as ever. It is also quite complex, due to the structure of shell programs, their use of opaque software components, and their complex interactions with the broader environment. As a result, even when exercising an abundance of care, shell developers discover devastating bugs in their programs only at runtime: at best, shell programs going wrong crash the execution of a long-running task; at worst, they silently corrupt the broader environment in which they execute—affecting user data, modifying system files, and rendering entire systems unusable. Could the shell’s users enjoy the benefits of semantics-driven static analysis before their programs’ execution—as offered by most other production languages?
| Last modified Friday, November 14th, 2025 1:00:31pm |