subsume

Geometric region embeddings for subsumption, entailment, and logical query answering. Boxes, cones, octagons, Gaussians, hyperbolic intervals, and sheaf networks. Ndarray and Candle backends. The only published Rust crate for geometric knowledge graph embeddings.

(a) Containment: nested boxes encode taxonomic is-a relationships. (b) Gumbel soft boundary: temperature controls membership sharpness. (c) Octagon: diagonal constraints cut corners for tighter volume bounds.

What it provides

Geometric primitives

Component	What it does
`Box` trait	Axis-aligned hyperrectangle: volume, containment, overlap, distance
`GumbelBox` trait	Probabilistic boxes via Gumbel random variables (dense gradients, no flat regions; Dasgupta et al., 2020)
`NdarrayCone`	Angular cones in d-dimensional space: containment via aperture, closed under negation (Zhang & Wang, NeurIPS 2021)
`NdarrayOctagon`	Axis-aligned polytopes with diagonal constraints; tighter volume bounds than boxes (Charpenay & Schockaert, IJCAI 2024)
`gaussian`	Diagonal Gaussian boxes: KL divergence (asymmetric containment) and Bhattacharyya coefficient (symmetric overlap)
`hyperbolic`	Poincare ball embeddings and hyperbolic box intervals (via `hyperball`)
`sheaf`	Sheaf diffusion primitives: stalks, restriction maps, Laplacian (Hansen & Ghrist 2019; Bodnar et al., ICLR 2022)

Scoring and query answering

Component	What it does
BoxE scoring	Point-entity BoxE model (Abboud et al., 2020) + box-to-box variant
`distance`	Depth-based (RegD), boundary, and vector-to-box distance metrics
`fuzzy`	Fuzzy t-norms/t-conorms for logical query answering (FuzzQE, Chen et al., AAAI 2022)
`el`	EL++ ontology embedding: inclusion loss, role translation/composition, existential boxes, disjointness (Box2EL/TransBox)

Taxonomy and training

Component	What it does
`taxonomy`	TaxoBell-format dataset loader: `.terms`/`.taxo` parsing, train/val/test splitting
`taxobell`	TaxoBell combined loss: Bhattacharyya triplet + KL containment + volume regularization + sigma clipping
Training utilities	Negative sampling, temperature scheduling, AMSGrad optimizer
Evaluation	MRR, Hits@k, NDCG, calibration (ECE, Brier), reliability diagrams

Backends

Component	What it does
`NdarrayBox` / `NdarrayGumbelBox` / `NdarrayCone` / `NdarrayOctagon`	CPU backend using `ndarray::Array1<f32>`
`CandleBox` / `CandleGumbelBox`	GPU/Metal backend using `candle_core::Tensor`

The ndarray backend is the primary workhorse with full geometry support. The candle backend provides GPU-accelerated box operations for training workflows.

Usage

[dependencies]
subsume = { version = "0.2.1", features = ["ndarray-backend"] }
ndarray = "0.16"

use subsume::ndarray_backend::NdarrayBox;
use subsume::Box as BoxTrait;
use ndarray::array;

// Box A: [0,0,0] to [1,1,1] (general concept)
let premise = NdarrayBox::new(array![0., 0., 0.], array![1., 1., 1.], 1.0)?;

// Box B: [0.2,0.2,0.2] to [0.8,0.8,0.8] (specific, inside A)
let hypothesis = NdarrayBox::new(array![0.2, 0.2, 0.2], array![0.8, 0.8, 0.8], 1.0)?;

// Containment probability: P(B inside A)
let p = premise.containment_prob(&hypothesis, 1.0)?;
assert!(p > 0.9);

Examples

cargo run -p subsume --example containment_hierarchy    # taxonomic is-a relationships with nested boxes
cargo run -p subsume --example gumbel_box_exploration   # Gumbel boxes, soft containment, temperature effects
cargo run -p subsume --example cone_training            # training cone embeddings on a taxonomy
cargo run -p subsume --example box_training             # training box embeddings on a 25-entity taxonomy
cargo run -p subsume --example taxobell_demo            # TaxoBell Gaussian box losses on a mini taxonomy
cargo run -p subsume --example query2box                # Query2Box: multi-hop queries, box intersection, distance scoring
cargo run -p subsume --example octagon_demo             # octagon embeddings: diagonal constraints, containment, volume
cargo run -p subsume --example fuzzy_query              # fuzzy query answering: t-norms, De Morgan duality, rankings
cargo run -p subsume --example el_training              # EL++ box embeddings on a biomedical-style ontology
cargo run -p subsume --features candle-backend --example taxobell_training  # TaxoBell MLP encoder training (Candle)

See examples/README.md for a guide to choosing the right example.

Tests

cargo test -p subsume

Unit, property, and doc tests covering:

Box geometry: intersection, union, containment, overlap, distance, volume, truncation
Gumbel boxes: membership probability, temperature edge cases, Bessel volume
Cones: angular containment, negation closure, aperture bounds
Octagon: intersection closure, containment, Sutherland-Hodgman volume
Fuzzy: t-norm/t-conorm commutativity, associativity, De Morgan duality
Gaussian boxes, EL++ ontology losses, sheaf networks, hyperbolic geometry
Training: MRR, Hits@k, NDCG, calibration, negative sampling, AMSGrad

Choosing a geometry

Geometry	When to use it	¬?	Key tradeoff
Box / GumbelBox	Containment hierarchies, each dimension independent	No	Simple, fast; Gumbel adds dense gradients where hard boxes have zero gradient
Cone	Multi-hop queries requiring negation (FOL with ¬)	Yes	Closed under complement; angular parameterization harder to initialize
Octagon	Rule-aware KG completion; tighter containment than boxes	No	Diagonal constraints cut box corners; more parameters per entity
Gaussian	Taxonomy expansion with uncertainty (TaxoBell)	No	KL = asymmetric containment; Bhattacharyya = symmetric overlap
Hyperbolic	Tree-like hierarchies with exponential branching	No	Low-dim capacity; numerical care near Poincare ball boundary

Why regions instead of points?

Point embeddings (TransE, RotatE, ComplEx) represent entities as vectors. They work well for link prediction -- RotatE hits 0.476 MRR on WN18RR, BoxE hits 0.451. For standard triple scoring, points are simpler and equally accurate.

Regions become necessary when the task requires structure that points cannot encode:

What you need	Points	Regions
Containment (A ⊆ B)	No -- points have no interior	Box nesting = subsumption
Volume = generality	No -- points have no size	Large box = broad concept
Intersection (A ∧ B)	No set operations	Box ∩ Box = another box
Negation (¬A)	No complement	Cone complement = another cone
Uncertainty per dimension	No	Gaussian sigma

Three tasks where point embeddings structurally fail:

Ontology completion (EL++): "Dog is-a Animal" requires representing one concept's extension as a subset of another's. Points have no containment. Box2EL, TransBox, and DELE use boxes for this and outperform point baselines on Gene Ontology, GALEN, and Anatomy by wide margins.
Logical query answering (∧, ∨, ¬): multi-hop KG queries with conjunction, disjunction, and negation need set operations. ConE handles all three (MRR 52.9 on FB15k EPFO+negation queries vs Query2Box's 41.0 and BetaE's 44.6). Points cannot attempt negation queries at all.
Taxonomy expansion: inserting a new concept at the right depth requires knowing both what it is (similarity) and how general it is (volume). TaxoBell uses Gaussian boxes where KL divergence gives asymmetric parent-child containment for free.

If your task is link prediction or entity similarity, use RotatE. If you need containment, set operations, or volume, you need regions.

Why Gumbel boxes?

Containment loss under increasing coordinate noise for Gumbel, Gaussian, and hard boxes. Gumbel boxes remain stable at perturbation levels where other formulations fail.

Gumbel boxes model coordinates as Gumbel random variables, creating soft boundaries that provide dense gradients throughout training. Hard boxes create flat regions where gradients vanish; Gumbel boxes solve this local identifiability problem (Dasgupta et al., 2020). As shown above, this also makes containment robust to coordinate noise -- Gumbel containment loss stays near zero even at high perturbation levels where Gaussian boxes fail completely.

Training convergence

25-entity taxonomy learned over 200 epochs. Left: total violation drops 3 orders of magnitude. Right: containment probabilities converge to 1.0 at different rates depending on hierarchy depth. Reproduce: cargo run --example box_training or uv run scripts/plot_training.py.

References

Nickel & Kiela (2017). "Poincare Embeddings for Learning Hierarchical Representations"
Vilnis et al. (2018). "Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures"
Li et al. (2019). "Smoothing the Geometry of Probabilistic Box Embeddings" (ICLR 2019)
Abboud et al. (2020). "BoxE: A Box Embedding Model for Knowledge Base Completion"
Dasgupta et al. (2020). "Improving Local Identifiability in Probabilistic Box Embeddings"
Ren et al. (2020). "Query2Box: Reasoning over Knowledge Graphs using Box Embeddings"
Hansen & Ghrist (2019). "Toward a Spectral Theory of Cellular Sheaves"
Bodnar et al. (2022). "Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs"
Zhang & Wang (2021). "ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs"
Chen et al. (2022). "Fuzzy Logic Based Logical Query Answering on Knowledge Graphs"
Jackermeier et al. (2023). "Dual Box Embeddings for the Description Logic EL++"
Yang, Chen & Sattler (2024). "TransBox: EL++-closed Ontology Embedding"
Bourgaux et al. (2024). "Knowledge Base Embeddings: Semantics and Theoretical Properties" (KR 2024)
Charpenay & Schockaert (2024). "Capturing Knowledge Graphs and Rules with Octagon Embeddings"
Lacerda et al. (2024). "Strong Faithfulness for ELH Ontology Embeddings" (TGDK 2024)
Huang et al. (2023). "Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs"
Mashkova et al. (2024). "DELE: Deductive EL++ Embeddings for Knowledge Base Completion"
Yang & Chen (2025). "Achieving Hyperbolic-Like Expressiveness with Arbitrary Euclidean Regions"
Mishra et al. (2026). "TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion" (WWW '26)

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.cargo		.cargo
.github		.github
benches		benches
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

subsume

What it provides

Geometric primitives

Scoring and query answering

Taxonomy and training

Backends

Usage

Examples

Tests

Choosing a geometry

Why regions instead of points?

Why Gumbel boxes?

Training convergence

References

See also

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

subsume

What it provides

Geometric primitives

Scoring and query answering

Taxonomy and training

Backends

Usage

Examples

Tests

Choosing a geometry

Why regions instead of points?

Why Gumbel boxes?

Training convergence

References

See also

License

About

Topics

Resources

License

Licenses found

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages