These abstracts are for talks at this event.
NEPLS is a venue for ongoing research, so the abstract and supplemental material associated with each talk is necessarily temporal. The work presented here may be in a state of flux. In all cases, please consult the authors' Web pages for up-to-date information. Please don't refer to these pages as a definitive source.
Knuckledragger: A Low Barrier Proof Assistant
Philip Zucker (Draper Labs)
Many automated verification systems boil down to a sequence of SMT queries with hand waving in between. Knuckledragger is an LCF-style interactive theorem prover designed as a minimal Python library on top of Z3. Knuckledragger can systematically link SMT queries in a Hilbert style proof system. All of the Z3 Python bindings are directly exposed and available because the terms, sorts, and formulas of Knuckledragger are literally Z3 Python objects. The Proof objects of Knuckledragger are a DAG of axiom instantiations and SMT queries. The basic design is portable between host languages but leveraging python's ecosystem gets a lot for free in terms of libraries, tooling, ML ecosystem, and interactive Jupyter notebooks. The goal of Knuckledragger is to support software/hardware verification, numerics, and other applications of interest to the hackers, scientists, and engineers that already are comfortable in Python. The project is available here https://github.com/philzook58/knuckledragger.
Nested Family Polymorphism for Proofs
Anastasiya Kravchuk-Kirilyuk (Harvard University)
One of the most annoying things about code is that it evolves over time. Old definitions expand, and new definitions spring up, interacting with old code and annihilating its fragile balance. In tandem, any formal proofs that depend on the changed code must evolve as well, leading to ailments such as branch confusion (as in, “which branch has the latest?”) and Sisyphus syndrome. We believe that programming languages themselves should be solving the problems they create: languages should have built-in safety features, making code (and by extension, proofs) modular and reusable by construction. To this end, our ongoing work builds on our earlier work — Persimmon — to assess its capabilities as a proof system for modular, extensible proofs. The original Persimmon is a functional language with type-safe extensibility features, realized via nested family polymorphism. All code in Persimmon is polymorphic to the enclosing family, meaning any inherited or extended code automatically adapts for safe reuse in the derived family. In our current work, we push the limits of what is possible (and reasonable) to unleash the hidden potential of nested family polymorphism for proofs. In this talk, we present a lightning account of our findings to date.
Scaling GPU Data-Race Freedom With Delinearization
Paul Maynard (University of Massachusetts Boston)
GPU programming is the prevalent computing model in many fields of computing, such as machine learning, artificial intelligence, and high performance computing. GPUs offer the advantage of massive parallelism, but run the risk of concurrency errors such as data races. Static analysis can find bugs the high resource cost of dynamic methods, which can be prohibitively expensive in the case of GPUs. A major challenge in static analysis is handling intricate code, such as certain high-dimensional data access patterns that are common in ML/AI code. In this talk, we present a novel approach that leverages compiler optimization methods to scale data-race analysis on certain programs. We show examples of higher-dimensional data that stump state of the art analysis tools. We also introduce a technique to reconstruct missing information to make the analysis more efficient and scalable, by applying delinearization, a technique used in compiler optimization. We introduce our prototype implementation of this technique, and evaluate the benefits of this approach to reveal improvements over the state of the art.
Regular language types for the streaming shell
Lukas Lazarek (Brown University), Zekai Li (Brown University)
The shell is a popular, powerful language for building data-transformation pipelines by composing black-box components. This composability is a double-edged sword: while it enables the construction of complex pipelines from simple components, it also makes it easy to create nonsensical compositions that silently produce incorrect results---due to the untyped, byte-oriented nature of inter-command communication. This talk presents Shtreams, a type system for shell pipelines that uses regular languages to model pipeline contents and catch composition errors before execution. Shtreams's regular language types provide a flexible way to describe command inputs and outputs, feel familiar within the shell ecosystem, naturally support polymorphism, and enable example-based error reporting---all while retaining the rich decision properties of regular languages. As a result, Shtreams is able to automatically and quickly analyze hundreds of real-world pipelines, catching hundreds of bugs that no other state of the art system identifies.
Vernacular software
Eagon Meng (MIT), Daniel Jackson (MIT)
Where is the source of truth in software? From code, specifications, and protocols to documentation and issue trackers, every project sprawls across many mediums. Our troubles in software go far beyond the sheer technical complexity of areas such as modern web development, and often boil down to a single question: "What is even happening?" Donald Knuth's vision of literate programming provided an enticing but often misunderstood approach: weave together a fundamentally human-driven and narratively ordered source of truth with snippets of code, and let our tools tangle that into something executable. This doesn't mean to just interleave your codebase with documentation, and certainly would not result in whatever is going on with out-of-order cell evaluation in Jupyter notebooks. Documentation seems to be a second-class citizen in software today. And with the rise of vibe coding and LLMs throwing their weight(s) around, can we really curb software complexity? But wait... are prompts actually causal documentation for code, and are they normalizing the idea of writing down your intention first? We explore how concept design, a theory that aligns independent units of functionality with user understanding, may enable a simpler vision: a vernacular codebase (with or without LLMs) for end-to-end development.
Modeling and Analyzing Performance Cost of GPU Kernels
Gregory Blike (University of Massachusetts Boston)
GPU programming has become the de facto choice for high performance computation, especially with applications in Machine Learning, signal processing, and physics simulation. A programmer’s failure to understand the unique constraints of programming a GPU can result in memory performance bugs. Two such bugs, bank conflicts and uncoalesced accesses, occur when concurrent memory accesses by different threads become serialized due to characteristics of the underlying hardware. Static analysis of memory performance in GPU kernels can help programmers identify memory performance bugs without executing the programs, by generalizing the performance overheads with a symbolic expression dependent on a kernel’s configuration, rather a dynamic analysis which is limited to a fixed value dependent on a particular execution’s data. In this talk, we introduce a core calculus of warp-semantics which abstracts the analysis of GPU kernels into a formalization that accounts for GPU hardware memory constraints. Our abstraction enables a sound over-approximation of memory performance penalties, and in many cases, the exact cost. We also introduce and evaluate an implementation of this abstraction which demonstrates improved correctness, exactness, and range of analyzable kernels.
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
Yuriy Brun (University of Masachusetts Amherst)
Formal verification using proof assistants, such as Coq, allows for high-quality software. However, the verification process is expensive, requiring significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning, and even more recently, large language models (LLMs), showing that retrieving relevant premises (such as lemmas and definitions) is helpful for these models. We present Rango, a fully automated proof synthesis tool for Coq that uses, not only relevant premises but also similar proofs from the current project. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,205 open-source Coq projects from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes 27.7% of the proofs, which is 10% more proofs than prior state-of-the-art tool Tactician. Our evaluation also shows that adding relevant proofs to the context in Rango leads to a 45% increase in the number of theorems proven.
This paper will appear in ICSE 2025, where it has won an ACM SIGSOFT Distinguished Paper Award.
Compositional Specification of Concurrent Objects Under Crashes
Arthur Oliveira Vale (Yale University)
In this talk, I will discuss the compositional specification of parallel programs under full-system crashes, such as those that can be caused by power outages or critical component failures. In this context, crash-safety is an important property, as the main functionality of some systems is resilience to crashes. Toward a compositional verification approach for crash-safety under full-system crashes, one observes that crashes propagate instantaneously to all components across all levels of abstraction, even to unspecified components, hindering compositionality. In the presence of concurrency, a correctness criterion that addresses both crashes and concurrency proves necessary. For this, several adaptations of linearizability have been suggested, featuring different trade-offs between complexity and expressiveness. The recently proposed compositional linearizability framework shows that to achieve compositionality with linearizability, both a locality and observational refinement property are necessary. Despite that, no linearizability criterion with crashes has been proven to support an observational refinement property. I will present our extension of compositional linearizability to the full-system crashes setting, crash-aware linearizability. Then, I will compare it with pre-existing linearizability criteria in this context, such as strict and durable linearizability. And show a few examples of how crash-aware linearizability may be used to specify crash-resilient systems and libraries.
Encoding Hypotheses and Experiments with Helical
Emma Tosch (Northeastern University)
Have you ever tried building on someone else's research artifact? For example, have you tried to reproduce a baseline or apply an existing technique to a new domain? Which do you find more useful: the artifact or the academic paper? In the first half of this talk I will present Helical, a framework for specifying (some!) hypotheses and experiments at a high level. Helical consists of a hypothesis language, an experiment language, and a configuration that captures the elements of an experimental design that are fixed and those that are free to vary. In the second half of this talk, I will ask the audience to work in teams to encode hypothetical research projects and share with the group what they have observed about this process.
QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning
Zhanna Kaufman (University of Massachusetts Amherst)
Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8% of test theorems. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
This work is appearing at ICSE 2025.
CoPy: Declarative Objected-oriented programming through copatterns
Adriano Corbelino (University of Masaschusetts Lowell)
Declarative programming and equational reasoning are often listed as strengths of the functional paradigm. However, this can be achieved in object-oriented languages with copattern matching and its dual, pattern matching. While pattern matching allows us to deconstruct data and branch according to its shape, copattern matching allows us to define the behavior of potentially infinite objects via a collection of observations.Those observations are schematics of a desired matching context that can be substituted by their outcome, an expression.Copatterns can be nested and mixed with patterns, enabling a rich and declarative way to refine our context's specifications. This work extends Python to support a declarative style of defining objects through copatterns and presents a new encoding of copatterns in a terse target language, with regular classes and method overloading, that we named Featherweight Python. This talk will introduce this declarative style by presenting examples of infinite objects (such as streams) and a series of arithmetic evaluators to highlight the compositional capabilities of the system.
Norman Principles of Design + Empirical CS = Dynamic Visualization Tools in FSM
Andres M. Garced (Seton Hall University), Marco T. Morazán (Seton Hall University), Tijana Minic (University of Washington)
This talk discusses the design of dynamic visualization tools for FSM, a functional domain-specific programming language embedded in Racket for the Formal Languages and Automata Theory classroom. The design of these tools are based on three pillars: the Norman principles of effective design, empirical data to foster a smooth user experience, and human-factors studies to measure user perceptions. We use the Norman principles of effective design to reduce the extraneous cognitive load on students. To ensure a smooth user experience, we conducted empirical studies to reduce start-up time and response times, which are impacted by image generation, volatile memory demands, and garbage collector interference. These studies focus on image generation techniques using parallelism, on the use of thunks to reduce volatile memory demands, and on demanding frequent collections by a incremental garbage collector. Finally, we present results of human-factors studies conducted to determine if students enjoy and find the tools useful.
Zero-Cost Lexical Effect Handlers
Cong Ma (University of Waterloo)
Exception handlers—and effect handlers more generally—are language mechanisms for structured nonlocal control flow. A recent trend in language-design research has introduced lexically scoped handlers, which address a modularity problem with dynamic scoping. While dynamically scoped handlers allows zero-overhead implementations when no effects are raised, existing implementations of lexically scoped handlers require programs to pay a cost just for having handlers in the lexical context. In this talk, we present a novel implementation strategy for lexically scoped handlers that satisfies the zero-cost principle—a property otherwise met by most modern compilers supporting dynamically scoped exceptions handlers. The key idea is a type-directed compilation that emits information indicating how handlers come into the lexical scope. This information guides the runtime in walking the stack to locate the right handler. The compilation scheme does not require any reified, lexically scoped representation of handlers when no effects are raised. We capture the essential aspects of this compilation scheme using core calculi. We prove the correctness of the compilation scheme by showing that it preserves semantics. Empirical results suggest that the implementation strategy is effective.
Rust isn’t a Silver Bullet for Systems Research (Yet)
Kinan Dak Albab (Brown University)
Over the past decade, systems researchers have explored how they can leverage language ``safety'' in Rust to design and build extensible software systems with valuable security or isolation properties. These ``Rusty’’ systems aim to guarantee information flow control, isolation of serverless functions, crash-resilient kernel extension modules, among others. This talk explores why this approach is imperfect in practice, resulting in serious violations of the guarantees Rusty systems seek to provide. We find that these failures are due to a mismatch between how Rusty systems extend Rust's properties in new ways and how (hidden) unsafe code breaks invariants the systems assume. We showcase this with two realistic examples that cause violations in recently published Rusty systems. We define a precise vocabulary for talking about safety in Rust and the properties thought to hold in ``safe’’ code. Finally, we explore approaches to catch and mitigate these mismatches, including Sniffer, a new targeted code auditing tool that uncover such issues assisted by static analysis.
From Linearity to Borrowing
Andrew Wagner (Northeastern University)
Linear type systems are powerful because they can statically ensure the correct management of resources like memory, but they can also be cumbersome to work with, since even benign uses of a resource require that it be explicitly threaded through during computation. Borrowing, as popularized by Rust, reduces this burden by allowing one to temporarily disable certain resource permissions (e.g., deallocation or mutation) in exchange for enabling certain structural permissions (e.g., weakening or contraction). In particular, this mechanism spares the borrower of a resource from having to explicitly return it to the lender but nevertheless ensures that the lender eventually reclaims ownership of the resource. In this work, we elucidate the semantics of borrowing by starting with a standard linear type system for ensuring safe manual memory management in an untyped lambda calculus and gradually augmenting it with immutable borrows, lexical lifetimes, reborrowing, and finally mutable borrows. We prove semantic type soundness of our Borrow Calculus (BoCa) using Borrow Logic (BoLo), a novel domain-specific separation logic for borrowing. We establish the soundness of this logic using a semantic model that additionally guarantees that our calculus is terminating and free of memory leaks. We also show that our Borrow Logic is robust enough to establish the semantic safety of some syntactically ill-typed programs that temporarily break but reestablish invariants.
Last modified Tuesday, April 29th, 2025 9:02:04am |