4 releases (2 breaking)
Uses new Rust 2024
| 0.3.0 | Nov 20, 2025 |
|---|---|
| 0.2.1 | Sep 22, 2025 |
| 0.2.0 | Aug 1, 2025 |
| 0.1.0 | Apr 9, 2025 |
#1214 in Encoding
Used in 5 crates
(3 directly)
24KB
265 lines
Compact U64
A machine-friendly varint, implemented in Rust.
See the crate docs for more information, or read the more detailed specification here.
lib.rs:
A variable-length encoding for unsigned 64 bit integers.
The core idea of this encoding scheme is to split the encoding into two parts: a tag which indicates how many bytes are used to encode the int, and then the actual int encoding, encoded in zero (sufficiently small ints can be inlined into the tag), one, two, four, or eight bytes. You can use tags of any width between two and eight bits — the wider the tag, the more int encodings can be inlined into the tag. The advantage of smaller tags is that multiple of them can fit into a single byte.
We have a detailed writeup on the encoding here. But to keep things self-contained, here is a precise definition of the possible codes for any u64 n and any number 2 <= tag_width <= 8:
- You can use the numerically greatest possible
tag_width-bit integer as the tag, and the eight-byte big-endian encoding ofnas the int encoding. - If
n < 256^4, you can use the numerically second-greatest possibletag_width-bit integer as the tag, and the four-byte big-endian encoding ofnas the int encoding. - If
n < 256^2, you can use the numerically third-greatest possibletag_width-bit integer as the tag, and the two-byte big-endian encoding ofnas the int encoding. - If
n < 256, you can use the numerically third-greatest possibletag_width-bit integer as the tag, and the one-byte encoding ofnas the int encoding. - If
tag_width > 2, and ifnis less than the numerically fourth-greatesttag_width-bit integer, you can use thetag_width-bit encoding ofnas the tag, and the empty string as theint encoding.
Our implementation uses the codec module of ufotofu to abstract over actual byte storage.
Encoding API
Use write_tag to write tags at arbitrary offsets into any [u8], and use cu64_encode to write the minimal int encoding for any given tag width into a BulkConsumer<Item = u8>.
use ufotofu::codec_prelude::*;
use compact_u64::*;
// Encode two u64s using two four-bit tags, combining the tags into a single byte.
let n1 = 258; // Requires two bytes for its int encoding.
let n2 = 7; // Can be inlined into the tag.
let tag_width1 = 4;
let tag_offset1 = 0;
let tag_width2 = 4;
let tag_offset2 = 4;
let mut tag_byte = 0;
// Write the four-bit tag for `n1` into `tag_byte`, starting at the most significant bit.
write_tag(&mut tag_byte, tag_width1, tag_offset1, n1);
// Write the four-bit tag for `n2` into `tag_byte`, starting at the fifth-most significant bit.
write_tag(&mut tag_byte, tag_width2, tag_offset2, n2);
// First four bits indicate a 2-byte int encoding, remaining four bits inline the integer `7`.
assert_eq!(tag_byte, 0xd7);
// The buffer into which we will write the encodings.
let mut buf = [0; 3];
let mut con = (&mut buf).into_consumer();
// First write the byte containing the two tags.
con.consume_item(tag_byte).await;
// Writing the int encoding for `n1` will write the two-byte big-endian code for `n1`.
cu64_encode(n1, tag_width1, &mut con).await.unwrap();
// Writing the int encoding for `n2` is a no-op, because `n2` is inlined in the tag.
cu64_encode(n2, tag_width2, &mut con).await.unwrap();
assert_eq!(buf, [0xd7, 1, 2]);
You can further use cu64_len_of_encoding to determine the number of bytes an int encoding would require for a given tag.
Decoding API
Use cu64_decode to decode the int encoding for some given tag from a BulkProducer<Item = u8>. Use cu64_decode_canonic if you want to reject non-minimal encodings.
use ufotofu::codec_prelude::*;
use compact_u64::*;
// We will decode a single byte containing two four-bit tags, and then decode
// two int encodings corresponding to the two tags.
let tag_width1 = 4;
let tag_offset1 = 0;
let tag_width2 = 4;
let tag_offset2 = 4;
// The encoding of two four-bit tags followed by two int encodings, for ints `258` and `7`.
let mut pro = producer::clone_from_slice(&[0xd7, 1, 2][..]);
// Read the byte that combines the two tags from the producer.
let mut tag_byte = pro.produce_item().await?;
// Decode the two ints.
let n1 = cu64_decode(tag_byte, tag_width1, tag_offset1, &mut pro).await?;
let n2 = cu64_decode(tag_byte, tag_width2, tag_offset2, &mut pro).await?;
assert_eq!(n1, 258);
assert_eq!(n2, 7);
Standalone APIs
The previous examples demonstrated the APIs for processing tags and int encodings separately. For the common case where you use an eight-bit tag immediately followed by the corresponding int encoding, we offer a more convenient API via cu64_encode_standalone and cu64_decode_standalone (and cu64_decode_canonic_standalone for rejecting non-minimal encodings):
use ufotofu::codec_prelude::*;
use compact_u64::*;
let mut buf = [0; 3];
let mut con = (&mut buf).into_consumer();
cu64_encode_standalone(258, &mut con).await.unwrap();
assert_eq!(buf, [0xfd, 1, 2]);
let n = cu64_decode_standalone(&mut producer::clone_from_slice(&buf[..])).await.unwrap();
assert_eq!(n, 258);
The same functionality is also exposed through the CompactU64 type, which is a thin wrapper around u64 that implements the Encodable, EncodableKnownLength, Decodable, and DecodableCanonic traits.
use ufotofu::codec_prelude::*;
use compact_u64::*;
let mut buf = [0; 3];
let mut con = (&mut buf).into_consumer();
con.consume_encoded(&CompactU64(258)).await.unwrap();
assert_eq!(buf, [0xfd, 1, 2]);
let n: CompactU64 = producer::clone_from_slice(&buf[..]).produce_decoded().await.unwrap();
assert_eq!(n.0, 258);
Invalid Parameters
The offset of a tag must always be a number between zero and seven (inclusive), and the width of a tag must always be a number between two and eight (inclusive). When a function takes a TagWidth and a TagOffset, their sum must be at most eight.
All functions in this crate may exhibit unspecified (but always safe) behaviour if these invariants are broken. When debug assertions are enabled, all functions in this crate are guaranteed to panic when these invariants are broken.
Dependencies
~4MB
~80K SLoC