Looks very interesting! We have done a lot with WasmCert-Isabelle (and there's also WasmRef-Isabelle and their 2023 paper, and the earlier WasmCert-Coq); other than being in Lean instead of Isabelle/HOL or Coq, how would you compare the approach you used? E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?

I'm also curious -- are you just implementing the Wasm binary and text parsing, validation algorithm, and execution semantics in Lean from scratch by reading the English prose in the spec document, and then checking it against the spec tests and the SpecTec description? Or do you have some sort of automated (classical or LLMy) transformation happening? (One could imagine directly transforming the SpecTec, or the OCaml reference interpreter, into Lean... but it sounds like you're not doing that? I think one needs to be a little careful here because e.g. at this point some of the English prose and reference interpreter implementation, and I think maybe some of the tests, are autogenerated from the SpecTec.) Which parts (if any) are outside the scope of the formalization? E.g. for WasmCert-Isabelle, I believe the binary and definitely the text parsing, and I think some of the arithmetic ops, are not covered.

How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc., and that's all before we get to the threads proposal? Because if the goal is to prove programs correct, one risk is that I prove my program correct against your Wasm interpreter (which maybe makes certain choices that aren't determined by the spec), and then I run it against another fully-conforming interpreter in the wild and it behaves incorrectly.

There is a lot to unpack in your comment, thanks for commenting.

We are heavily using LLMs and agents for writing and verifying all the code. We have some safeguard inplace, such as not breaking the wasm testsuite, and being able to run wasm code and produce the correct results (even if that is not the goal of this interpreter). There is an ongoing effort (not by us) about creating a formal WASM spec in lean, generated from SpecTec, when that lands our plan is to prove that our interpreter follow the specification.

> E.g. are you able to do an "in-place Store" like WasmRef-Isabelle, and can you represent memories and tables as plain vectors of bytes/refs in memory, can you grow them in-place, etc.? Or any other optimizations/lessons learned?

Even if the interpreter can be use to run wasm code, we don't really care about efficiency, it is not intended to run WASM, but instead to be able to verify code, so we are intentionally not building an optimized interpreter, or rather we are optimizing toward ease of demonstration.

> How are you modeling the explicit sources of nondeterminism in the Wasm execution semantics? E.g. NaN representation, {memory., table.}grow, host calls, stack exhaustion, relaxed SIMD instructions, etc.,

Going over your list, some functions that might fail, like {memory., table.}grow and stack exhaustion can't fail on our interpreter, so that transition is not represented here. We need to revisit this hypothesis, but part of the reasoning is that we all properties we prove about the WASM bytecode, are properties held by the original code, and usually (for example in lean) you can prove that a function has some property for all values of a natural number, even if you can't really run the function for all values due to some kind of "system" failure (stack overflow).

NaN, we certainly need to revisit floats, we are delegaing its behaviour to Lean floats. Again part of why we think this might work is because we are not planning to run code with this interpreter, but just to write proofs.

Host calls is properly modeled and being developed as we speak, since we really care about being able to proofs properties of a code that runs on some particular host. When you write a statement about some code, you do it on the presence of some particular host. I'm exploring the possibility of formalizing NEAR smart contracts, which are WASM binaries.