I’m writing this issue to open up a discussion about what the stateless block execution spec should do whenever there is an error during the stateless block execution.
What is done today in the spec
Before proposing potential options, it is worth briefly explaining how things work today.
The entry point for stateless block execution today is run_stateless_guest(...) . Here, we deserialise the SSZ encoded input, and call verify_stateless_new_payload(...) which has the main stateless execution logic. This last function has a big-ish try-catch with most of the EngineAPI+STF logic inside.
If any internal logic of verify_stateless_new_payload(...) raises an exception, then we return a StatelessValidationResult with the appropriate new_payload_request_root and chain_config (since calculating those is unfallible), and set successful_validation = False.
What this means for zkEVMs is that the zkVM can still generate a proof, proving that the stateless execution failed for some reason. These reasons can be:
- Validations in the input (i.e. EL block, execution witness, etc) failed some rule. e.g. the execution witness is missing data
- The STF validation failed e.g. the calculated post-state root doesn’t match the block header claimed one
- Any other implementation bug raised an exception e.g. buggy dependency has an out-of-bounds array access for whatever reason (this must not happen, thus is a bug).
In all three cases, an exception during verify_stateless_new_payload(...) will raise, and the try-catch will return the mentioned StatelessValidationResult with a successful_validation = False field. If none of these situations happens, an analogous StatelessValidationResult will be returned, but with a True value for that field.
But note that not every failure that happens during stateless execution falls into this try-catch and thus has this behaviour. If we look at the first code snippet of the stateless block execution entrypoint run_stateless_guest(...) we see:
|
def run_stateless_guest(input_bytes: Bytes) -> Bytes: |
|
""" |
|
Run the stateless guest with serialized input, return serialized output. |
|
""" |
|
stateless_input = deserialize_stateless_input(input_bytes) |
|
stateless_output = verify_stateless_new_payload(stateless_input) |
|
|
|
output_bytes = serialize_stateless_output(stateless_output) |
|
return output_bytes |
If the input SSZ deserialization raises an exception (e.g. ssz bytes are invalid), then the stateless execution will crash instead of returning a StatelessValidatorResult with successful_validation = False.
In summary, today we don’t have a very clear definition of how “failures” are handled. An invalid STF and an invalid SSZ input are both potential expected logic errors and behave differently. While unexpected bugs (e.g. a buggy out-of-index array access) behave as an invalid STF.
The goal of this issue is trying to discover what is the best design for handling this.
Potential options and their usefulness
I would say we have two main options:
- Option 1: any raised exception during the stateless block execution returns a
StatelessValidationResult with successful_validation = False.
- Option 2: any raised exception during the stateless block execution is not catched and let it crash. i.e. in practical terms, the zkVM proof won’t be able to be constructed.
Note that both options discard the current reality where we have a mix of both styles — I believe avoiding this is useful to avoid complexity that does not add any clear value, and might make things harder to reason about.
Let’s explore a bit each option.
Option 1: always return successful_validation = False
In this option, it is worth noting what a proof verifier should understand when successful_validation = False. It doesn’t mean that the block validation was invalid, it means that there was at least one non-recoverable error during the stateless execution. This could mean an invalid STF, but also some unexpected bug.
This means that using successful_validation = False proofs to claim that the block builder build an invalid block, it is not correct. While this case isn’t strictly useful for the protocol, it was once considered useful for allowing the proposer prove that the block builder built an invalid proof.
This raises the question: if successful_validation = False doesn’t say much than the stateless execution failed for some generic reason, is this field useful after all compared to not allowing the proof to be created? (i.e. Option 2).
Also note that apart from successful_validation = False , the StatelessValidationResult not always have a valid new_payload_request_root value. If the failure happened inside verify_stateless_new_payload(...) , we would return a correct value since this calculation can’t fail before the try-catch. But if the failure happens before this logic during SSZ input deserialization, then we can’t really return a meaningful new_payload_request_root since it wasn’t calculated yet, thus the value should probably be zeros. This means that the public input for this proof isn’t even attributable to any particular block, so it doesn’t seem entirely useful for the proof verifier (i.e. the input is invalid, thus we don’t even know what was the intention to prove).
At this point, it seems that Option 1 is only useful to know that in any case the stateless execution always return a StatelessValidationResult in some shape/form, and it never should crash. As in, in any case, a proof can be created with a StatelessValidationResult public input.
Note that we could transform successful_execution to not be boolean, and help distinguish between valid/invalid blocks or any other unexpected crash — this might allow to provide more value in the “block builder build an invalid block” use-case mentioned before; but this adds some complexity.
Option 2: let the stateless execution crash
In this option, we forget about having this successful_validation = False field, and only return StatelessValidationResult on successfully executed blocks. Any other case would make the program crash, and a proof must not be generated.
From the proof verifier's perspective, this might seem to have clearer semantics, since a proof that validates means the block is valid. This might be overoptimizing for the attestor use-case, leaving no room for other kinds of interpretation for other use-cases.
cc @kevaundray
I’m writing this issue to open up a discussion about what the stateless block execution spec should do whenever there is an error during the stateless block execution.
What is done today in the spec
Before proposing potential options, it is worth briefly explaining how things work today.
The entry point for stateless block execution today is
run_stateless_guest(...). Here, we deserialise the SSZ encoded input, and callverify_stateless_new_payload(...)which has the main stateless execution logic. This last function has a big-ishtry-catchwith most of the EngineAPI+STF logic inside.If any internal logic of
verify_stateless_new_payload(...)raises an exception, then we return aStatelessValidationResultwith the appropriatenew_payload_request_rootandchain_config(since calculating those is unfallible), and setsuccessful_validation = False.What this means for zkEVMs is that the zkVM can still generate a proof, proving that the stateless execution failed for some reason. These reasons can be:
In all three cases, an exception during
verify_stateless_new_payload(...)will raise, and thetry-catchwill return the mentionedStatelessValidationResultwith asuccessful_validation = Falsefield. If none of these situations happens, an analogousStatelessValidationResultwill be returned, but with aTruevalue for that field.But note that not every failure that happens during stateless execution falls into this
try-catchand thus has this behaviour. If we look at the first code snippet of the stateless block execution entrypointrun_stateless_guest(...)we see:execution-specs/src/ethereum/forks/amsterdam/stateless_guest.py
Lines 33 to 41 in 717421a
If the input SSZ deserialization raises an exception (e.g. ssz bytes are invalid), then the stateless execution will crash instead of returning a
StatelessValidatorResultwithsuccessful_validation = False.In summary, today we don’t have a very clear definition of how “failures” are handled. An invalid STF and an invalid SSZ input are both potential expected logic errors and behave differently. While unexpected bugs (e.g. a buggy out-of-index array access) behave as an invalid STF.
The goal of this issue is trying to discover what is the best design for handling this.
Potential options and their usefulness
I would say we have two main options:
StatelessValidationResultwithsuccessful_validation = False.Note that both options discard the current reality where we have a mix of both styles — I believe avoiding this is useful to avoid complexity that does not add any clear value, and might make things harder to reason about.
Let’s explore a bit each option.
Option 1: always return
successful_validation = FalseIn this option, it is worth noting what a proof verifier should understand when
successful_validation = False. It doesn’t mean that the block validation was invalid, it means that there was at least one non-recoverable error during the stateless execution. This could mean an invalid STF, but also some unexpected bug.This means that using
successful_validation = Falseproofs to claim that the block builder build an invalid block, it is not correct. While this case isn’t strictly useful for the protocol, it was once considered useful for allowing the proposer prove that the block builder built an invalid proof.This raises the question: if
successful_validation = Falsedoesn’t say much than the stateless execution failed for some generic reason, is this field useful after all compared to not allowing the proof to be created? (i.e. Option 2).Also note that apart from
successful_validation = False, theStatelessValidationResultnot always have a validnew_payload_request_rootvalue. If the failure happened insideverify_stateless_new_payload(...), we would return a correct value since this calculation can’t fail before the try-catch. But if the failure happens before this logic during SSZ input deserialization, then we can’t really return a meaningfulnew_payload_request_rootsince it wasn’t calculated yet, thus the value should probably be zeros. This means that the public input for this proof isn’t even attributable to any particular block, so it doesn’t seem entirely useful for the proof verifier (i.e. the input is invalid, thus we don’t even know what was the intention to prove).At this point, it seems that Option 1 is only useful to know that in any case the stateless execution always return a
StatelessValidationResultin some shape/form, and it never should crash. As in, in any case, a proof can be created with aStatelessValidationResultpublic input.Note that we could transform
successful_executionto not be boolean, and help distinguish between valid/invalid blocks or any other unexpected crash — this might allow to provide more value in the “block builder build an invalid block” use-case mentioned before; but this adds some complexity.Option 2: let the stateless execution crash
In this option, we forget about having this
successful_validation = Falsefield, and only returnStatelessValidationResulton successfully executed blocks. Any other case would make the program crash, and a proof must not be generated.From the proof verifier's perspective, this might seem to have clearer semantics, since a proof that validates means the block is valid. This might be overoptimizing for the attestor use-case, leaving no room for other kinds of interpretation for other use-cases.
cc @kevaundray