9 releases (5 breaking)
Uses new Rust 2024
| 0.6.1 | Feb 20, 2026 |
|---|---|
| 0.6.0 | Feb 19, 2026 |
| 0.5.0 | Feb 13, 2026 |
| 0.4.0 | Feb 13, 2026 |
| 0.1.0 | Jan 7, 2026 |
#722 in Development tools
190KB
4K
SLoC
SBE Generator for Rust
This crate provides a small, pragmatic compiler for the
Simple Binary Encoding (SBE) protocol. It reads
SBE XML schemas and produces zero‑copy Rust message types that are
ready for use in performance‑sensitive applications. Generated
structures rely on the zerocopy crate to provide
safe, alignment‑aware views over raw byte buffers.
Unlike the reference SBE toolchain, this project focuses solely on Rust and keeps the API minimal and easy to use. It does not support code generation for other languages.
Features
- Zero‑copy decoding: Generated message structs derive
FromBytes,IntoBytes,KnownLayout,ImmutableandUnalignedso they can be safely cast from network buffers without copying. - Spec‑aware encoding builders: Each message gets a
FooBuildercompanion that writes the fixed block, groups and variable data with the correct offsets, byte order and length prefixes. Builders can also emit the standard SBE message header. - Byte‑order aware fields: Multi‑byte integer and floating‑point
fields use the
zerocopy::byteordertypes (e.g.little_endian::U32,little_endian::F64) so that endianness is explicit and efficient. - Field offsets and padding: Explicit
offsetattributes are honoured, with padding inserted to keep layout in sync with the SBE block length. - Acting-version aware decoding: Helpers accept the standard SBE message header, apply the advertised block length/version at runtime, and expose presence checks so older payloads still parse safely.
- Declarative parsing helpers: Each generated message implements a
parse_prefixhelper that leverageszerocopy::Refto split a slice into a typed prefix and a remainder. - Groups and variable data included: Nested repeating groups are
emitted with iterable views and entry structs, and
datafields becomeVarDataslices with an ergonomicas_str()helper. - Optional fields:
presence="optional"fields stay zero‑copy but gain<field>_opt()accessors that returnOptionbased on the SBE null value for that primitive. - Constant field correctness:
presence="constant"fields are treated as non-encoded wire data. Generated code exposes associated constants plus constant accessors, and builders/encoders do not emit writes for those fields. - Schema reflection: Generated code surfaces
SINCE_VERSION,SEMANTIC_TYPE, field offsets and constraint constants so you can reason about compatibility at the call site.
Usage
Add the sbe_gen crate to your Cargo.toml and build a small driver
program which reads your XML schema and writes the generated code to
disk:
use std::{fs, path::Path};
use sbe_gen::{generate_to, GeneratorOptions};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let schema_xml = fs::read_to_string("./my-schema.xml")?;
let out_dir = Path::new("./src/sbe");
generate_to(&schema_xml, out_dir, &GeneratorOptions::default())?;
Ok(())
}
Alternatively, install the CLI binary with cargo install --path . and
run it directly:
cargo run --bin sbe_gen -- -i path/to/my-schema.xml -o src/sbe
For a more complete guide with end-to-end examples (fixed-size and variable-size messages), see docs/USAGE.md.
This will create a module in src/sbe containing one Rust file per
message defined in the schema. Each file starts with common
imports and includes a parse_prefix helper on each message type. For example, given a
message header like:
<message name="PacketHdr" id="0" blockLength="12">
<field name="seq" id="1" type="uint32" />
<field name="sending_time" id="2" type="uint64" />
</message>
the generator produces the following Rust code:
use zerocopy::{Ref, FromBytes, IntoBytes, KnownLayout, Immutable, Unaligned};
use zerocopy::byteorder::little_endian::{U32, U64};
#[repr(C)]
#[derive(Debug, FromBytes, IntoBytes, KnownLayout, Immutable, Unaligned, Clone, Copy)]
pub struct PacketHdr {
pub seq: U32,
pub sending_time: U64,
}
impl PacketHdr {
#[inline]
pub fn parse_prefix(body: &[u8]) -> Option<(&Self, &[u8])> {
Ref::<_, Self>::from_prefix(body)
.ok()
.map(|(r, b)| (Ref::into_ref(r), b))
}
}
Build-time integration (build.rs)
If you want generated modules to live inside your crate (like
examples/cme_mdp3_pcap_dump), use a build script that runs the
generator at compile time.
- Add the build dependency:
[build-dependencies]
sbe_gen = "0.6.1"
- Organize schemas under
schemas/<schema_name>/:
schemas/
my_schema/
templates_FixBinary.xml # preferred name
If templates_FixBinary.xml is not present, the build script below
expects exactly one .xml file in that directory.
- Add
build.rsthat emits code intosrc/generated/<schema_name>and writessrc/generated/mod.rs:
use std::env;
use std::fs;
use std::path::{Path, PathBuf};
use sbe_gen::{generate_to, GeneratorOptions};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR")?);
let schemas_dir = manifest_dir.join("schemas");
let generated_root = manifest_dir.join("src/generated");
println!("cargo:rerun-if-changed=build.rs");
println!("cargo:rerun-if-changed={}", schemas_dir.display());
fs::create_dir_all(&generated_root)?;
let mut schema_dirs: Vec<PathBuf> = fs::read_dir(&schemas_dir)?
.filter_map(|entry| entry.ok())
.map(|entry| entry.path())
.filter(|path| path.is_dir())
.collect();
schema_dirs.sort();
let opts = GeneratorOptions::default();
let mut modules = Vec::new();
for schema_dir in schema_dirs {
let schema_name = schema_dir
.file_name()
.and_then(|s| s.to_str())
.ok_or("invalid schema directory name")?
.to_string();
let schema_xml = pick_schema_xml(&schema_dir)?;
println!("cargo:rerun-if-changed={}", schema_xml.display());
let schema_contents = fs::read_to_string(&schema_xml)?;
let output_dir = generated_root.join(&schema_name);
fs::create_dir_all(&output_dir)?;
generate_to(&schema_contents, &output_dir, &opts)?;
modules.push(schema_name);
}
write_generated_mod(&generated_root, &modules)?;
Ok(())
}
fn pick_schema_xml(dir: &Path) -> Result<PathBuf, Box<dyn std::error::Error>> {
let preferred = dir.join("templates_FixBinary.xml");
if preferred.is_file() {
return Ok(preferred);
}
let mut xml_files: Vec<PathBuf> = fs::read_dir(dir)?
.filter_map(|entry| entry.ok())
.map(|entry| entry.path())
.filter(|path| path.extension().map(|ext| ext == "xml").unwrap_or(false))
.collect();
xml_files.sort();
match xml_files.len() {
0 => Err(format!("no XML schema file found in {}", dir.display()).into()),
1 => Ok(xml_files.remove(0)),
_ => Err(format!(
"multiple XML files found in {}, pick one via templates_FixBinary.xml",
dir.display()
)
.into()),
}
}
fn write_generated_mod(root: &Path, modules: &[String]) -> Result<(), Box<dyn std::error::Error>> {
let mut buf = String::from("// @generated by build.rs; do not edit\n");
for module in modules {
buf.push_str(&format!("pub mod {};\n", module));
}
fs::write(root.join("mod.rs"), buf)?;
Ok(())
}
- Expose and use the generated modules:
pub mod generated;
use crate::generated::my_schema::packet_hdr::PacketHdr;
If you only have one schema, you can simplify the build script to read a single XML file instead of scanning subdirectories.
Groups and variable data
Groups are emitted as iterators layered on top of the raw buffer, and
variable‑length data fields come back as lightweight VarData<'a>
wrappers you can inspect as bytes or as UTF‑8 strings.
<message name="Book" id="1" blockLength="4">
<field name="seq" id="1" type="uint32" />
<group name="Levels" id="2" blockLength="16" dimensionType="groupSize">
<field name="price" id="1" type="int64" />
<field name="qty" id="2" type="int64" />
<data name="note" id="3" type="varStringEncoding" />
</group>
<data name="raw" id="4" type="varStringEncoding" />
</message>
The generated module exposes clear, chainable helpers:
use sbe::book::*;
let (book, rest) = Book::parse_prefix(bytes).expect("prefix");
// Parse the Levels group
let levels = parse_levels(rest).expect("levels header");
for level in levels.iter() {
let price = level.price().map(|v| v.get());
let qty = level.qty().map(|v| v.get());
let note = level.note.as_str();
}
let after_levels = levels.iter().remainder();
// Parse trailing variable data
let (raw, tail) = book.parse_raw(after_levels).expect("raw data");
let raw_str = raw.as_str();
Fast path for fixed schema/version
Default decode should use has_* + accessor methods because that path is
safe across schema evolution (shorter acting_block_length, older
acting_version, constant fields, etc.).
If your producer schema/version is pinned and fixed, you can guard once and
then read view.body directly to avoid per-field presence checks:
let (hdr, body) = MessageHeader::parse_prefix(frame).expect("header");
let (view, rest) = sbe::book::parse_with_header(body, &hdr).expect("book");
assert!(rest.is_empty());
// Safe-by-default, schema-evolution path.
let seq = view.seq().map(|v| v.get());
// Optional fast path for fixed layout streams.
if view.is_fixed_layout() {
let msg = &*view.body;
let seq_fast = msg.seq.get();
let _ = seq_fast;
}
Group entries follow the same pattern: check
entry.acting_block_length >= size_of::<Entry>() before using
&*entry.body directly.
Encoding with builders
Every message module includes a builder that writes the fixed block,
groups and variable data with the correct padding, offsets and length
prefixes. Builders accept native Rust numeric types and take care of the
endianness for you.
Variable-length field setters return Result if the payload exceeds the
length prefix type.
use sbe::book::*;
let mut builder = BookBuilder::new();
builder.seq(123);
builder.levels(|levels| {
levels.entry(|entry| {
entry.price(101_500);
entry.qty(10);
entry.note(b"resting").expect("note");
});
});
builder.raw(b"payload").expect("raw");
// Emit the message framed with the standard SBE header
let framed = builder.finish_with_header(); // Vec<u8>
// or if you only need the body:
// let body = builder.finish(); // Vec<u8>
// Zero-allocation path for hot loops:
let mut dst = [0u8; 256];
let written = Book::encode_body_into(&mut dst, |enc| {
enc.seq(123);
enc.raw(b"payload")?;
Ok(())
})?;
CME MDP3 pcap dump example
This repository includes a standalone example crate that generates CME
MDP3 decoders from examples/cme_mdp3_pcap_dump/schemas/cme_mdp3/templates_FixBinary.xml
and dumps Market-by-Order packets from a pcap file:
cargo run --manifest-path examples/cme_mdp3_pcap_dump/Cargo.toml -- \
--pcap /path/to/file.pcap
The example supports filters (--src-port, --dst-port, --udp-port,
--src, --dst) and --limit. The pcap crate requires libpcap
headers to be installed on your system.
Status and limitations
This project is a work‑in‑progress. The generator covers the core SBE types (primitives, enums, sets, composites, groups and variable data) along with the standard message header and byte‑order rules. Notable spec features that are still missing:
- Optional composites are treated as required; optional handling is only emitted for primitives, enums and sets.
- Group-entry versioning currently uses block-length based presence
checks for fixed fields in entry views; explicit entry-level
sinceVersiongating is not emitted yet. - Value constraints for builders/encoders are enforced via
assert!checks and therefore panic on violation (rather than returning recoverable validation errors). - Constant fields remain
constdefinitions rather than struct members.
Contributions to extend the generator are welcome. See the
src/parser.rs and src/codegen.rs modules for the implementation.
Dependencies
~1.1–1.7MB
~32K SLoC