1 K generated code
Our first stop is at the Lean code generated by K. The K Framework comes with the klean tool which allows the extraction of Kore definitions into a shallow embedding in Lean.
The generated code is rather large, so we will narrow our focus down to the following:
Overall structure of the generated code
Necessary modifications done to the generated code
Most used pieces
This doesn’t aim to be a full description of the code generated code, but rather, to give the reader a good enough intuition from which to take on. For a preliminary discussion on how the code is generated, see this github issue.
1.1 Structure of the Code
The generated code is divided into five files. In dependent order:
Prelude.lean: Prelude for K definitions containing bootstrapping the shallow embedding.
Sorts.lean: Definition of K sorts as inductive datatypes.
Inj.lean: Instances of the Inj type class.
Func.lean: Extraction of declared functions.
Rewrites.lean: Rewrite rules coded as constructors or the Rewrite inductive type.
1.2 Prelude
The prelude contains the necessary definitions to bootstrap the Lean backend for K. Since the extraction performs a shallow embedding, we need to provide the necessary definitions that make use of already existing Lean definitions.
To give more context on what the Prelude.lean does, we need to understand what are K hooked functions.
K productions can have a hook attribute (e.g., converting integers to bytes), which tells K that the production will rely on an implementation from the target backend with a given name. Since for this project the target backend is the Lean 4 backend, Prelude.lean provides the definitions needed for it.
In particular, we have the following definitions:
1.2.1 Type Abbreviations
As a part of the shallow embedding approach, we alias existing Lean types to signal that they’ll be used as K sorts.
Alias for Bool.
Alias for ByteArray.
Alias for Int.
1.2.2 Builtin Inductives
Some embedded K sorts need to be implemented in the prelude before code generation happens. This is because hooked functions defined in the prelude need to have these sorts already defined.
Bytes endianness.
Bytes signedness for integer representations.
1.2.3 Inj type class
Injection type class.
Every sort in K is a subsort of the KItem sort. This type class will enable the injections of arbitrary sorts into the KItem sort with ease.
Wrapper to Inj.inj to trigger type resolution.
1.2.4 Function Implementations
It is now when we implement some of the hooked functions. Hooked productions not implemented in this prelude will be generated as axioms.
Also note that the return type of all functions is of sort Option ?T. This is to deal with the generation of partial K functions.
SortInt functions
Arithmetic Operations:
some (n + m)
some (n - m)
some (n * m)
Truncated division.
Euclidian modulus.
Integer powmod.
Warning: Leanblueprint has a problem with \(\backslash \)lean{} references that contain special symbols. In this case, the \(\backslash \)lean{} reference is «_^%Int__». This will cause the reference to be broken.
Max operation.
\(log_2\) on positive integers.
Integer complement.
Warning: Leanblueprint has a problem with \(\backslash \)lean{} references that contain special symbols. In this case, the \(\backslash \)lean{} reference is « Int_». This will cause the reference to be broken.
SortBytes functions
Empty bytes wrapper.
SortInt \(\leftrightarrow \) SortBytes conversion
Integer to bytes conversion.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
Bytes to integer conversion.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
Bytes manipulation
Pads to the right a bytes array with a specific value.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
Pads to the right a bytes array with a specific value.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
Get the length of a bytes array.
Replace contents inside a bytes array.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
Slice a bytes array up.
Warning: The Lean reference link is broken because the name of the definition contains commas, and this apparently results in a new line.
1.3 Generated Functions
Right now there are no generated functions documented. However, we will only document those functions that play a topmost role in the theorems.
1.3.1 Modifications to the Generated Code
Modifications to the Func.lean file are for adding termination arguments to functions whenever Lean is not able to infer it automatically.
At the moment, there are only two mutual blocks of functions that need an explicit termination argument. These most important functions of those blocks are defined here [32, 33].
To prove termination we define an ordering of the schedule constants and flags here.
1.4 Rewrite
The Rewrite.lean file only contains the definition of the Rewrites type.
This type encodes the rewrite rules that are extracted to Lean.
Rules extracted to Lean from K will be enconded as constructors of this type.
We have that Rewrites : SortGeneratedTopCell → SortGeneratedTopCell → Prop where SortGeneratedTopCell is the type for KEVM states.
Given p, p’ : SortGeneratedTopCell, Rewrites p p’ means that the KEVM state p rewrites to p’.
All of the requires clauses of a rule are encoded as the corresponding constructor’s arguments.
There’s an special transitivy rule that ensures the chaining of multiple rewrite rules:
| tran {s1 s2 s3 : SortGeneratedTopCell} (t1 : Rewrites s1 s2) (t2 : Rewrites s2 s3) : Rewrites s1 s3
Note that we will not be extracting all KEVM rules. Instead, we have produced summary rules and extracted those.
Summary rules are rules that capture KEVM’s computational behavior. Executing an opcode typically takes more than one rule application. Instead of extracting these rules, we’ve put together the result of applying them as these summaries, which KEVM can verify its correctness.
By doing this, we need only extract the summary rules to capture the semantic meaning of executing an opcode in KEVM.