EVM Equivalence

2 Interfaces

We are describing the files in the Interfaces folder.

We have to reason about the EvmYul model and the generated K code. But we soon face the challenge that the EvmYul model is more geared towards execution than reasoning and that the generated K code is particularly idiosyncratic.

This is not to the detriment of any of the models, but we do need reasoning interfaces for these models to smooth the reasoning process.

2.1 EvmYul Interface

This interface contains results that aid with the reasoning of EvmYul functions.

There are several type of results in this interface:

  • USize results

  • Compatibility axioms

  • Bytes manipulation results

  • UInt256 results and conversions

  • Opcode semantics

We won’t describe the entirety of the results provided in this interface, but we’ll discuss the two most pressing categories: compatibility axioms and opcode semantics.

2.1.1 Compatibility Axioms

Of particular importance are two axioms necessary to the whole operation. The first one is ffi_zeroes, which involves the EvmYul zeroes function.

Aixiom ffi_zeroes

Definition 26 zeroes
#

zeroes len produces a byte array of size len where every byte is the zero byte.

This part of the EvmYul FFI.

Crucially, the zeroes function is a foreign function interface (FFI) function. This means that there is no Lean implementation of it. Rather, when executed, Lean will call an external function to perform the computational action. However, the zeroes function plays a crucial role in important functions such as writing to memory.

That is why, in order to reason about it, we pose an axiom stating the desired behavior of the zeroes function.

Theorem 27 Axiom: ffi_zeroes
#

zeroes len returns a byte array of size len where every byte is the zero byte.

Aixiom toBytesBigEndian_rw

The second needed axiom is to bypass a private attribute of the EvmYul toBytes’ function defined below:

Definition 28 toBytes’
#

toBytes’ n converts a given natural number n to its little endian bytes representation.

Crucially, this definition is private.

The toBytes’ function is used by toBytesBigEndian:

Definition 29 toBytesBigEndian
#

toBytes’ n converts a given natural number n to its big endian bytes representation.

toBytesBigEndian = List.reverse ∘ toBytes’✝

When dealing with toBytesBigEndian we need to also deal with toBytes’. The following axiom states that toBytesBigEndian is equal to a List.reverse ∘ toBytes’_ax where toBytes’_ax is a dummy definition copied from the EvmYul model.

As future work, we could submit a PR to the EvmYul model repository to change these private attributes.

Theorem 30 Axiom: toBytesBigEndian_rw
#

EvmYul.toBytesBigEndian n = (List.reverse ∘ toBytes’_ax) n

2.1.2 Opcode Semantics

The most relevant theorem of this kind is the behavior of the X function (defined in 39) when executing a single-opcode program and having the program counter be 1.

We then specify the result of running the \(X\) function on a bad opcode.

Theorem 31 X_bad_pc
#

Given

  • a single-opcode program \(P\),

  • a gas amount greater than 1,

  • the program counter set to 1,

  • the symbolic and well constrained remaining inputs of \(X\).

Executing \(X\) on \(P\) with a fully symbolic state with the above conditions is successful and returns an empty output with the following modifications to the symbolic state:

  • The parameter returnData is empty.

  • The executionLength is increased by one.

Proof

Massaging with tactics the execution path that evaluates if the program counter is out of bounds derives this result.

2.2 KEVM Interface

We use this interface to provide intermediate lemmas or better representations for the K-generated Lean definitions.

This interface contains results about the following categories:

  • Simple function behavior: results about simple functions like arithmetic or boolean operations.

  • Behavior of «#sizeWordStack»: Equating «#sizeWordStack» to List.length.

  • Behavior of K’s in_keys function.

  • Behavior of #inStorage function.

  • Results that have to do with memory-related operations.

  • Bytes manipulation: ease the reasoning about ‘SortBytes‘ manipulation functions.

2.3 Gas Interface

KEVM models all EVM hardforks (schedules). So, when executing a program in KEVM, we need to keep track of which hardfork (schedule) are we executing with. Knowing the schedule we’re executing with allows us to know the correct gas cost of an opcode, and to know which rules apply to our schedule of choice.

There are two functions involved in this:

Definition 32 _<_>
#

The «_<_>_SCHEDULE_Int_ScheduleConst_Schedule» function (or _<_> for short) takes as arguments a gas constant (such as \(G_{\text{verylow}}\)) and a schedule (such as CANCUN) and returns how much is that gas constant in that schedule.

Definition 33 _<<_>>

The function «_<<_>>_SCHEDULE_Bool_ScheduleFlag_Schedule» (or _<<_>> for short) takes as arguments a schedule flag (such as \(G_{\text{hascreate2}}\)) and a schedule and returns the corresponding boolean if the schedule has the schedule flag. This is used to discriminate which rules can be applied depending on the schedule we’re working with.

In this example, working on an schedule SCHED, any rule involving the CREATE2 opcode will have to satisfy \(G_{\text{hascreate2}}\)<< SCHED >> in order to be applied.

Given that EvmYul works with the Cancun schedule, the two main results of this interface are a plain rewrite of the _<_> and _<<_>> functions to the Cancun schedule.

Theorem 34 cancun_def
#

Given const : SortScheduleConst, we equate «_<_>_SCHEDULE_Int_ScheduleConst_Schedule» const .CANCUN_EVM to a simple match statement on const.

Proof

This result follows from applying the custom tactics found in Tactics.lean, which boil down to a bunch of simplifications.

Theorem 35 flag_cancun_def
#

Given flag : SortScheduleFlag, we equate «_<<_>>_SCHEDULE_Bool_ScheduleFlag_Schedule» flag .CANCUN_EVM to a simple match statement on flag.

Proof

This result follows from applying the custom tactics found in Tactics.lean, which boil down to a bunch of simplifications.