Explicate non-RLP-encodable structures. by acoglio · Pull Request #736 · ethereum/yellowpaper

acoglio · 2019-03-29T07:17:46Z

Besides the changes to the text that can be easily seen in the diff, this commit changes

to

and also

to

These new definitions of the RLP encoding functions are consistent with my formalization of RLP encoding in the ACL2 theorem prover, which I have proved to be injective and prefix-unambiguous. The prefix-unambiguity property means that no valid encoding is a strict prefix of another valid encoding; this ensures decodability from a stream of bytes that may not have an end-of-encoding marker.

Excessive large structures cannot be encoded in RLP, namely byte arrays that contain 2^64 or more bytes, and lists whose concatenated serialized items contain 2^64 or more bytes. These restrictions ensure that the first byte of an encoding is indeed a byte, and that the first bytes of byte arrays vs. list encodings are disjoint. Also see the encode_length function in the RLP page of the Ethereum Wiki. Prior to this commit, the definition of the RLP function in Appendix B did not explicate these restrictions. Even though these restrictions can be reasonably inferred from the fact that RLP encodings must be easily decodable, this commit improves clarity by having the RLP function return an explicit "error" value when the input structure cannot be encoded. We just need a few more cases in the equations that define the functions R_b, R_l, and s. The error value is currently \varnothing, but a different symbol could be used instead.

nicksavers · 2019-04-04T07:18:21Z

@acoglio I agree that the limit was not properly specified and can agree to this change. I can't however, check all client implementations whether they comply exactly with this formal definition. Could you perhaps get various client teams to sign off on this to avoid any incompatibilities?

acoglio · 2019-04-08T02:13:24Z

@nicksavers I will contact the teams.

acoglio · 2019-04-09T20:01:46Z

@nicksavers I posted a message to the Go Ethereum Discord general channel, then I saw that you had done that already, and the RLP implementor confirmed that the spec change is okay (message of 2019-04-02 from user fjl on general channel).

Note that the 2^64 limit is inherent to the encoding method:

If we wanted to encode strings of 2^64 or more bytes, we would need 9 or more bytes for the length. But adding 9 or more to 183 (in equation (180)) yields a first byte that is 192 or more, which would thus overlap with the encodings of lists, whose first byte is 192 or more. So the decoder would not be readily able to distinguish strings from lists based on the first byte.
If we wanted to encode lists whose concatenated component encodings are 2^64 bytes or more, we would need 9 or more bytes for the length. But adding 9 or more to 247 (in equation (183)) yields 256 or more, which does not fit in a byte.

The 2^64 limits, although not explicated by the YP, could be argued to be inferable based on the above observations. But I believe that explicating them makes things clearer in the YP.

acoglio · 2019-04-09T20:06:48Z

Besides extending the equations, my commit includes the following added text that makes the above observations (in more concise form; I can expand them in a new commit if you think it would be useful):

Byte arrays containing $2^{64}$ or more bytes cannot be encoded. This restriction ensures that the first byte of the encoding of a byte array is always below 192, and thus it can be readily distinguished from the encodings of sequences in $\mathbb{L}$.

Sequences whose concatenated serialized items contain $2^{64}$ or more bytes cannot be encoded. This restriction ensures that the first byte of the encoding does not exceed 255 (otherwise it would not be a byte).

nicksavers merged commit 2a79beb into ethereum:master Apr 11, 2019

acoglio deleted the rlp-err branch April 11, 2019 23:01

acoglio mentioned this pull request Apr 13, 2019

Specify maximum byte sequence and tree structure lengths #648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicate non-RLP-encodable structures.#736

Explicate non-RLP-encodable structures.#736
nicksavers merged 1 commit intoethereum:masterfrom
acoglio:rlp-err

acoglio commented Mar 29, 2019

Uh oh!

nicksavers commented Apr 4, 2019

Uh oh!

acoglio commented Apr 8, 2019

Uh oh!

acoglio commented Apr 9, 2019

Uh oh!

acoglio commented Apr 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acoglio commented Mar 29, 2019

Uh oh!

nicksavers commented Apr 4, 2019

Uh oh!

acoglio commented Apr 8, 2019

Uh oh!

acoglio commented Apr 9, 2019

Uh oh!

acoglio commented Apr 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants