Explicate non-RLP-encodable structures.#736
Conversation
Excessive large structures cannot be encoded in RLP, namely byte arrays that contain 2^64 or more bytes, and lists whose concatenated serialized items contain 2^64 or more bytes. These restrictions ensure that the first byte of an encoding is indeed a byte, and that the first bytes of byte arrays vs. list encodings are disjoint. Also see the encode_length function in the RLP page of the Ethereum Wiki. Prior to this commit, the definition of the RLP function in Appendix B did not explicate these restrictions. Even though these restrictions can be reasonably inferred from the fact that RLP encodings must be easily decodable, this commit improves clarity by having the RLP function return an explicit "error" value when the input structure cannot be encoded. We just need a few more cases in the equations that define the functions R_b, R_l, and s. The error value is currently \varnothing, but a different symbol could be used instead.
|
@acoglio I agree that the limit was not properly specified and can agree to this change. I can't however, check all client implementations whether they comply exactly with this formal definition. Could you perhaps get various client teams to sign off on this to avoid any incompatibilities? |
|
@nicksavers I will contact the teams. |
|
@nicksavers I posted a message to the Go Ethereum Discord Note that the 2^64 limit is inherent to the encoding method:
The 2^64 limits, although not explicated by the YP, could be argued to be inferable based on the above observations. But I believe that explicating them makes things clearer in the YP. |
|
Besides extending the equations, my commit includes the following added text that makes the above observations (in more concise form; I can expand them in a new commit if you think it would be useful):
|
Besides the changes to the text that can be easily seen in the diff, this commit changes




to
and also
to
These new definitions of the RLP encoding functions are consistent with my formalization of RLP encoding in the ACL2 theorem prover, which I have proved to be injective and prefix-unambiguous. The prefix-unambiguity property means that no valid encoding is a strict prefix of another valid encoding; this ensures decodability from a stream of bytes that may not have an end-of-encoding marker.